!pip install graphviz
!pip install matplotlib --upgrade
!pip install scikit-learn --upgrade
!pip install dtreeviz
Collecting graphviz
  Downloading https://files.pythonhosted.org/packages/4d/ea/81a8018c234ad6219c27155fe3dce349f2ba83e6ebed582c9f8aca46091f/graphviz-0.20-py3-none-any.whl (46kB)
     |████████████████████████████████| 51kB 1.4MB/s eta 0:00:011
Installing collected packages: graphviz
Successfully installed graphviz-0.20
Collecting matplotlib
  Downloading https://files.pythonhosted.org/packages/df/e7/0ad4aad00d6d0314aaf97526a54a34d385f898ebb7915d112431fff452ff/matplotlib-3.5.2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.2MB)
     |████████████████████████████████| 11.2MB 3.4MB/s eta 0:00:01
Requirement already satisfied, skipping upgrade: python-dateutil>=2.7 in /opt/conda/lib/python3.7/site-packages (from matplotlib) (2.8.0)
Collecting pillow>=6.2.0 (from matplotlib)
  Downloading https://files.pythonhosted.org/packages/4b/83/090146d7871d90a2643d469c319c1d014e41b315ab5cf0f8b4b6a764ef31/Pillow-9.1.0.tar.gz (49.8MB)
     |████████████████████████████████| 49.8MB 72.3MB/s eta 0:00:01
Requirement already satisfied, skipping upgrade: kiwisolver>=1.0.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib) (1.1.0)
Requirement already satisfied, skipping upgrade: cycler>=0.10 in /opt/conda/lib/python3.7/site-packages (from matplotlib) (0.10.0)
Collecting packaging>=20.0 (from matplotlib)
  Downloading https://files.pythonhosted.org/packages/05/8e/8de486cbd03baba4deef4142bd643a3e7bbe954a784dc1bb17142572d127/packaging-21.3-py3-none-any.whl (40kB)
     |████████████████████████████████| 40kB 19.0MB/s eta 0:00:01
Collecting fonttools>=4.22.0 (from matplotlib)
  Downloading https://files.pythonhosted.org/packages/2f/85/2f6e42fb4b537b9998835410578fb1973175b81691e9a82ab6668cf64b0b/fonttools-4.33.3-py3-none-any.whl (930kB)
     |████████████████████████████████| 931kB 14.9MB/s eta 0:00:01
Requirement already satisfied, skipping upgrade: pyparsing>=2.2.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib) (2.4.2)
Requirement already satisfied, skipping upgrade: numpy>=1.17 in /opt/conda/lib/python3.7/site-packages (from matplotlib) (1.17.3)
Requirement already satisfied, skipping upgrade: six>=1.5 in /opt/conda/lib/python3.7/site-packages (from python-dateutil>=2.7->matplotlib) (1.12.0)
Requirement already satisfied, skipping upgrade: setuptools in /opt/conda/lib/python3.7/site-packages (from kiwisolver>=1.0.1->matplotlib) (41.0.1)
Building wheels for collected packages: pillow
  Building wheel for pillow (setup.py) ... done
  Created wheel for pillow: filename=Pillow-9.1.0-cp37-cp37m-linux_x86_64.whl size=1360701 sha256=46f69aca43262d0368ac583732dba5a89c6ba015b29a1f9d7a6dd8555f6a2734
  Stored in directory: /home/dlr6w3/.cache/pip/wheels/d0/0a/89/3973e8e933c38a8546f815e39d72fa5b83f0b80bbbba6e652a
Successfully built pillow
Installing collected packages: pillow, packaging, fonttools, matplotlib
  Found existing installation: Pillow 6.1.0
    Uninstalling Pillow-6.1.0:
      Successfully uninstalled Pillow-6.1.0
  Found existing installation: packaging 19.0
    Uninstalling packaging-19.0:
      Successfully uninstalled packaging-19.0
  Found existing installation: matplotlib 3.1.1
    Uninstalling matplotlib-3.1.1:
      Successfully uninstalled matplotlib-3.1.1
Successfully installed fonttools-4.33.3 matplotlib-3.5.2 packaging-21.3 pillow-9.1.0
Collecting scikit-learn
  Downloading https://files.pythonhosted.org/packages/6d/09/75d4dccea54627920db3cfeb5183ba9f0be2c9b18c4ad00ca6621d009d4f/scikit_learn-1.0.2-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (23.0MB)
     |████████████████████████████████| 23.0MB 3.5MB/s eta 0:00:01
Requirement already satisfied, skipping upgrade: scipy>=1.1.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn) (1.2.1)
Collecting joblib>=0.11 (from scikit-learn)
  Downloading https://files.pythonhosted.org/packages/3e/d5/0163eb0cfa0b673aa4fe1cd3ea9d8a81ea0f32e50807b0c295871e4aab2e/joblib-1.1.0-py2.py3-none-any.whl (306kB)
     |████████████████████████████████| 307kB 13.1MB/s eta 0:00:01
Requirement already satisfied, skipping upgrade: numpy>=1.14.6 in /opt/conda/lib/python3.7/site-packages (from scikit-learn) (1.17.3)
Collecting threadpoolctl>=2.0.0 (from scikit-learn)
  Downloading https://files.pythonhosted.org/packages/61/cf/6e354304bcb9c6413c4e02a747b600061c21d38ba51e7e544ac7bc66aecc/threadpoolctl-3.1.0-py3-none-any.whl
Installing collected packages: joblib, threadpoolctl, scikit-learn
  Found existing installation: scikit-learn 0.20.3
    Uninstalling scikit-learn-0.20.3:
      Successfully uninstalled scikit-learn-0.20.3
Successfully installed joblib-1.1.0 scikit-learn-1.0.2 threadpoolctl-3.1.0
Collecting dtreeviz
  Downloading https://files.pythonhosted.org/packages/36/2c/f63679fe981fa88be41aabcd24e58d206691d6c02fa57eb87965d697097c/dtreeviz-1.3.6.tar.gz (61kB)
     |████████████████████████████████| 71kB 4.1MB/s eta 0:00:011
Requirement already satisfied: graphviz>=0.9 in /opt/conda/lib/python3.7/site-packages (from dtreeviz) (0.20)
Requirement already satisfied: pandas in /opt/conda/lib/python3.7/site-packages (from dtreeviz) (0.25.2)
Requirement already satisfied: numpy in /opt/conda/lib/python3.7/site-packages (from dtreeviz) (1.17.3)
Requirement already satisfied: scikit-learn in /opt/conda/lib/python3.7/site-packages (from dtreeviz) (1.0.2)
Requirement already satisfied: matplotlib in /opt/conda/lib/python3.7/site-packages (from dtreeviz) (3.5.2)
Collecting colour (from dtreeviz)
  Downloading https://files.pythonhosted.org/packages/74/46/e81907704ab203206769dee1385dc77e1407576ff8f50a0681d0a6b541be/colour-0.1.5-py2.py3-none-any.whl
Collecting pytest (from dtreeviz)
  Downloading https://files.pythonhosted.org/packages/fb/d0/bae533985f2338c5d02184b4a7083b819f6b3fc101da792e0d96e6e5299d/pytest-7.1.2-py3-none-any.whl (297kB)
     |████████████████████████████████| 307kB 7.0MB/s eta 0:00:01
Requirement already satisfied: python-dateutil>=2.6.1 in /opt/conda/lib/python3.7/site-packages (from pandas->dtreeviz) (2.8.0)
Requirement already satisfied: pytz>=2017.2 in /opt/conda/lib/python3.7/site-packages (from pandas->dtreeviz) (2019.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn->dtreeviz) (3.1.0)
Requirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.7/site-packages (from scikit-learn->dtreeviz) (1.1.0)
Requirement already satisfied: scipy>=1.1.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn->dtreeviz) (1.2.1)
Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.7/site-packages (from matplotlib->dtreeviz) (4.33.3)
Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.7/site-packages (from matplotlib->dtreeviz) (21.3)
Requirement already satisfied: pillow>=6.2.0 in /opt/conda/lib/python3.7/site-packages (from matplotlib->dtreeviz) (9.1.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib->dtreeviz) (1.1.0)
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.7/site-packages (from matplotlib->dtreeviz) (0.10.0)
Requirement already satisfied: pyparsing>=2.2.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib->dtreeviz) (2.4.2)
Collecting iniconfig (from pytest->dtreeviz)
  Downloading https://files.pythonhosted.org/packages/9b/dd/b3c12c6d707058fa947864b67f0c4e0c39ef8610988d7baea9578f3c48f3/iniconfig-1.1.1-py2.py3-none-any.whl
Collecting pluggy<2.0,>=0.12 (from pytest->dtreeviz)
  Downloading https://files.pythonhosted.org/packages/9e/01/f38e2ff29715251cf25532b9082a1589ab7e4f571ced434f98d0139336dc/pluggy-1.0.0-py2.py3-none-any.whl
Collecting py>=1.8.2 (from pytest->dtreeviz)
  Downloading https://files.pythonhosted.org/packages/f6/f0/10642828a8dfb741e5f3fbaac830550a518a775c7fff6f04a007259b0548/py-1.11.0-py2.py3-none-any.whl (98kB)
     |████████████████████████████████| 102kB 16.8MB/s ta 0:00:01
Collecting tomli>=1.0.0 (from pytest->dtreeviz)
  Downloading https://files.pythonhosted.org/packages/97/75/10a9ebee3fd790d20926a90a2547f0bf78f371b2f13aa822c759680ca7b9/tomli-2.0.1-py3-none-any.whl
Collecting importlib-metadata>=0.12; python_version < "3.8" (from pytest->dtreeviz)
  Downloading https://files.pythonhosted.org/packages/92/f2/c48787ca7d1e20daa185e1b6b2d4e16acd2fb5e0320bc50ffc89b91fa4d7/importlib_metadata-4.11.3-py3-none-any.whl
Collecting attrs>=19.2.0 (from pytest->dtreeviz)
  Downloading https://files.pythonhosted.org/packages/be/be/7abce643bfdf8ca01c48afa2ddf8308c2308b0c3b239a44e57d020afa0ef/attrs-21.4.0-py2.py3-none-any.whl (60kB)
     |████████████████████████████████| 61kB 17.3MB/s eta 0:00:01
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.7/site-packages (from python-dateutil>=2.6.1->pandas->dtreeviz) (1.12.0)
Requirement already satisfied: setuptools in /opt/conda/lib/python3.7/site-packages (from kiwisolver>=1.0.1->matplotlib->dtreeviz) (41.0.1)
Collecting typing-extensions>=3.6.4; python_version < "3.8" (from importlib-metadata>=0.12; python_version < "3.8"->pytest->dtreeviz)
  Downloading https://files.pythonhosted.org/packages/75/e1/932e06004039dd670c9d5e1df0cd606bf46e29a28e65d5bb28e894ea29c9/typing_extensions-4.2.0-py3-none-any.whl
Collecting zipp>=0.5 (from importlib-metadata>=0.12; python_version < "3.8"->pytest->dtreeviz)
  Downloading https://files.pythonhosted.org/packages/80/0e/16a7ee38617aab6a624e95948d314097cc2669edae9b02ded53309941cfc/zipp-3.8.0-py3-none-any.whl
Building wheels for collected packages: dtreeviz
  Building wheel for dtreeviz (setup.py) ... done
  Created wheel for dtreeviz: filename=dtreeviz-1.3.6-cp37-none-any.whl size=67331 sha256=f239d466e20d53a6ef6101a893794d100733fcb6d500b7fcb2e03d41b351842c
  Stored in directory: /home/dlr6w3/.cache/pip/wheels/77/d1/26/5178d600fc61411a3f0e839deae05ff33a9b41d913c43f5097
Successfully built dtreeviz
Installing collected packages: colour, iniconfig, typing-extensions, zipp, importlib-metadata, pluggy, py, tomli, attrs, pytest, dtreeviz
  Found existing installation: attrs 19.1.0
    Uninstalling attrs-19.1.0:
      Successfully uninstalled attrs-19.1.0
Successfully installed attrs-21.4.0 colour-0.1.5 dtreeviz-1.3.6 importlib-metadata-4.11.3 iniconfig-1.1.1 pluggy-1.0.0 py-1.11.0 pytest-7.1.2 tomli-2.0.1 typing-extensions-4.2.0 zipp-3.8.0
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import MinMaxScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score, ConfusionMatrixDisplay, accuracy_score, auc, roc_curve, explained_variance_score, r2_score
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
import graphviz
from sklearn.decomposition import PCA, FactorAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import SelectKBest, chi2, f_classif, mutual_info_classif
from sklearn.preprocessing import normalize
from sklearn.preprocessing import OneHotEncoder
from sklearn.dummy import DummyClassifier

import random
%matplotlib inline

# set max columns to none
pd.set_option("display.max_columns", None)
 
# set colwidth hidher
pd.set_option('display.max_colwidth', 100)


##set seed for reproducibility
random.seed(21)
ipeds = pd.read_csv("/dsa/groups/capstonesp2022/on-campus/group_2/IPEDSApril26_updated.csv")
ipeds = ipeds.drop(columns = ['Unnamed: 0'])
ipeds.head()
unitid year est_fte exp_instruc_total_per exp_acad_supp_total_per exp_student_serv_total_per exp_inst_supp_total_per exp_acad_inst_student_total_per act_composite_75_pctl act_composite_25_pctl completion_rate_150pct acceptance_rate cc_basic_2010 STATE Grad_Rates_Two_Classes Grad_Rates_Three_Classes Grad_Rates_Quartiles OrgStructure_Independent OrgStructure_MultiCampus OrgStructure_MultiOrg Carnegie_Art_Music_Design Carnegie_Associate Carnegie_Bachelors Carnegie_Business_Management Carnegie_Engineering Carnegie_Graduate_Professional Carnegie_Missing Carnegie_Other_Tech_Health Carnegie_Research_Universities Carnegie_Theological Carnegie_Tribal_Colleges
0 100654.0 2005.0 0.090823 0.017790 0.008844 0.029916 0.029844 0.025031 0.157895 0.140187 0.330986 0.395262 Not Provided AL 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0
1 100654.0 2006.0 0.089841 0.019824 0.009532 0.030221 0.032811 0.026845 0.157895 0.140187 0.327968 0.386814 Not Provided AL 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0
2 100654.0 2007.0 0.083368 0.022740 0.011179 0.035761 0.036962 0.030852 0.149123 0.140187 0.320926 0.289306 Not Provided AL 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0
3 100654.0 2008.0 0.080223 0.022626 0.010009 0.034068 0.024525 0.024116 0.157895 0.140187 0.282696 0.468386 Not Provided AL 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0
4 100654.0 2009.0 0.079459 0.023176 0.007307 0.037372 0.037780 0.029416 0.149123 0.140187 0.326962 0.450185 Not Provided AL 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0
ipeds = pd.get_dummies(ipeds, columns=["STATE"], prefix=["US"] )
dropped = ipeds.dropna()
print(dropped.shape)
dropped.head()
(15648, 78)
unitid year est_fte exp_instruc_total_per exp_acad_supp_total_per exp_student_serv_total_per exp_inst_supp_total_per exp_acad_inst_student_total_per act_composite_75_pctl act_composite_25_pctl completion_rate_150pct acceptance_rate cc_basic_2010 Grad_Rates_Two_Classes Grad_Rates_Three_Classes Grad_Rates_Quartiles OrgStructure_Independent OrgStructure_MultiCampus OrgStructure_MultiOrg Carnegie_Art_Music_Design Carnegie_Associate Carnegie_Bachelors Carnegie_Business_Management Carnegie_Engineering Carnegie_Graduate_Professional Carnegie_Missing Carnegie_Other_Tech_Health Carnegie_Research_Universities Carnegie_Theological Carnegie_Tribal_Colleges US_AL US_AR US_AZ US_CA US_CO US_CT US_DE US_FL US_GA US_IA US_ID US_IL US_IN US_KS US_KY US_LA US_MA US_MD US_ME US_MI US_MN US_MO US_MS US_MT US_NC US_ND US_NE US_NH US_NJ US_NM US_NV US_NY US_OH US_OK US_OR US_PA US_RI US_SC US_SD US_TN US_TX US_UT US_VA US_VT US_WA US_WI US_WV US_WY
0 100654.0 2005.0 0.090823 0.017790 0.008844 0.029916 0.029844 0.025031 0.157895 0.140187 0.330986 0.395262 Not Provided 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 100654.0 2006.0 0.089841 0.019824 0.009532 0.030221 0.032811 0.026845 0.157895 0.140187 0.327968 0.386814 Not Provided 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 100654.0 2007.0 0.083368 0.022740 0.011179 0.035761 0.036962 0.030852 0.149123 0.140187 0.320926 0.289306 Not Provided 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 100654.0 2008.0 0.080223 0.022626 0.010009 0.034068 0.024525 0.024116 0.157895 0.140187 0.282696 0.468386 Not Provided 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 100654.0 2009.0 0.079459 0.023176 0.007307 0.037372 0.037780 0.029416 0.149123 0.140187 0.326962 0.450185 Not Provided 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##split into test and train
##split into train test/val
train, testVal = train_test_split(dropped, test_size=0.25, random_state = 21)
##split into test validation
test, validation = train_test_split(testVal, test_size = .5, random_state = 21)
print(train.shape)
print(test.shape)
print(validation.shape)
(11736, 78)
(1956, 78)
(1956, 78)
X_train = train.loc[:, ~dropped.columns.isin(['unitid', 'Grad_Rates_Two_Classes', 'exp_acad_inst_student_total_per','Grad_Rates_Three_Classes', 'Grad_Rates_Quartiles', 'completion_rate_150pct', 'year', 'cc_basic_2010'])]

X_test = test.loc[:, ~dropped.columns.isin(['unitid', 'Grad_Rates_Two_Classes', 'exp_acad_inst_student_total_per','Grad_Rates_Three_Classes', 'Grad_Rates_Quartiles', 'completion_rate_150pct', 'year', 'cc_basic_2010'])]

y_train = train.completion_rate_150pct

y_test = test.completion_rate_150pct

X_train.head(1)
est_fte exp_instruc_total_per exp_acad_supp_total_per exp_student_serv_total_per exp_inst_supp_total_per act_composite_75_pctl act_composite_25_pctl acceptance_rate OrgStructure_Independent OrgStructure_MultiCampus OrgStructure_MultiOrg Carnegie_Art_Music_Design Carnegie_Associate Carnegie_Bachelors Carnegie_Business_Management Carnegie_Engineering Carnegie_Graduate_Professional Carnegie_Missing Carnegie_Other_Tech_Health Carnegie_Research_Universities Carnegie_Theological Carnegie_Tribal_Colleges US_AL US_AR US_AZ US_CA US_CO US_CT US_DE US_FL US_GA US_IA US_ID US_IL US_IN US_KS US_KY US_LA US_MA US_MD US_ME US_MI US_MN US_MO US_MS US_MT US_NC US_ND US_NE US_NH US_NJ US_NM US_NV US_NY US_OH US_OK US_OR US_PA US_RI US_SC US_SD US_TN US_TX US_UT US_VA US_VT US_WA US_WI US_WV US_WY
11618 0.804298 0.092503 0.049112 0.06432 0.055147 0.236842 0.224299 0.54551 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
print(np.all(np.isfinite(X_train)))
print(np.any(np.isnan(X_train)))
True
False

Looking at Multicollinearity

corrMatrix=X_train.corr()
corrMatrix.style.background_gradient(cmap='coolwarm')
est_fte exp_instruc_total_per exp_acad_supp_total_per exp_student_serv_total_per exp_inst_supp_total_per act_composite_75_pctl act_composite_25_pctl acceptance_rate OrgStructure_Independent OrgStructure_MultiCampus OrgStructure_MultiOrg Carnegie_Art_Music_Design Carnegie_Associate Carnegie_Bachelors Carnegie_Business_Management Carnegie_Engineering Carnegie_Graduate_Professional Carnegie_Missing Carnegie_Other_Tech_Health Carnegie_Research_Universities Carnegie_Theological Carnegie_Tribal_Colleges US_AL US_AR US_AZ US_CA US_CO US_CT US_DE US_FL US_GA US_IA US_ID US_IL US_IN US_KS US_KY US_LA US_MA US_MD US_ME US_MI US_MN US_MO US_MS US_MT US_NC US_ND US_NE US_NH US_NJ US_NM US_NV US_NY US_OH US_OK US_OR US_PA US_RI US_SC US_SD US_TN US_TX US_UT US_VA US_VT US_WA US_WI US_WV US_WY
est_fte 1 0.0967812 0.114643 -0.236306 -0.122488 0.217633 0.225483 -0.0289656 -0.237285 0.0678804 0.215469 -0.0533886 -0.0650903 -0.260792 -0.0196449 -0.0284471 -0.0131926 -0.0284104 -0.0569322 0.541407 -0.0868782 -0.0119945 0.0164678 -0.0168913 0.112321 0.120395 0.0482039 0.00338628 0.0221385 0.0824277 -0.00897759 -0.0426974 0.0319785 -0.0217072 0.00268712 -0.0429346 -0.0157403 0.0083698 -0.034833 -0.0120453 -0.0257353 0.0358332 -0.0227757 -0.0348196 -0.00901266 -0.00541184 -0.016836 -0.0158209 -0.0360222 -0.0154257 -0.007575 0.0102785 0.0480413 -0.0187019 -0.0184526 -0.012005 -0.015048 -0.0454119 0.00315613 -0.0460578 -0.0352078 -0.029037 0.0785166 0.0939969 -0.00682535 -0.0251213 0.0351272 -0.00186858 -0.0313039 0.0156499
exp_instruc_total_per 0.0967812 1 0.65737 0.532143 0.725584 0.443166 0.495598 -0.375067 0.0122851 0.0141928 -0.0165563 0.0136862 -0.0381477 -0.0708079 -0.00784476 0.0118131 -0.0700728 -0.0542125 0.00676797 0.31973 -0.0331109 -0.00694348 -0.0228034 -0.0342289 -0.0049527 0.092938 0.00750358 0.115846 0.00207524 -0.0134509 -0.0230337 -0.0306683 -0.0228689 0.0656537 -0.0188506 -0.0377986 -0.0312729 -0.0209788 0.0684658 0.0529512 -0.00124157 -0.0195168 -0.0238797 0.0259896 -0.0284623 -0.0181119 -0.0143535 -0.0124463 -0.0199634 0.00158923 0.0188601 -0.00589527 0.00091142 0.0791389 -0.0143462 -0.0317274 -0.00258593 -0.00388298 0.0144263 -0.0454662 -0.0269191 -0.0197127 -0.0278553 -0.00377335 -0.00636504 0.00103296 0.0023682 -0.0384491 -0.0380825 0.00472888
exp_acad_supp_total_per 0.114643 0.65737 1 0.482769 0.696309 0.424454 0.472639 -0.354304 -0.00430665 0.00433523 0.00296424 0.0170575 -0.0363283 -0.0656239 0.00948298 0.0167767 -0.0793915 -0.0518648 -0.00598595 0.30789 -0.00265617 0.000149355 -0.0115194 -0.0338901 0.000897928 0.0682382 0.0233892 0.0844693 -0.010923 -0.0121219 -0.0220498 -0.0234397 -0.0214807 0.0320517 -0.0259501 -0.0323062 -0.0272824 -0.0254707 0.147776 -0.00786446 0.00944414 -0.014961 -0.014435 -0.0094142 -0.0344027 -0.0163734 -0.0267402 -0.0249364 -0.0202269 0.0870605 0.0471192 -0.0218641 0.0092364 0.0567416 -0.030313 -0.0358096 0.00346497 0.0322034 0.0249062 -0.0480746 -0.0264378 -0.0362217 -0.0194134 -0.00471176 0.00347537 0.0216799 -0.00571933 -0.0328574 -0.0441493 0.00254724
exp_student_serv_total_per -0.236306 0.532143 0.482769 1 0.597805 0.368181 0.390109 -0.327938 0.0827803 -0.000455343 -0.0822535 0.0184627 -0.0367329 0.171808 0.0285924 0.123081 -0.0295598 -0.147154 -0.00699845 0.0369432 -0.00443732 0.0245916 -0.0212787 -0.0449249 -0.0242059 0.0935616 -0.03048 0.160667 -0.00635275 -0.0264442 -0.0230657 0.0131243 -0.0443549 0.056317 -0.022251 0.0137563 -0.0136057 -0.0539766 0.133448 0.00687159 0.0347593 -0.0389462 0.00450383 -0.0301886 -0.0508904 -0.0130648 -0.0331196 -0.0314623 -0.0222646 0.0408025 0.0617851 -0.039038 -0.00909094 0.0336236 -0.00345602 -0.058984 0.0468469 0.0444927 0.00958416 -0.0317049 -0.0269502 -0.046965 -0.0427503 -0.0353013 0.00687161 0.0238824 -0.00874179 -0.0345826 -0.0373967 -0.0215221
exp_inst_supp_total_per -0.122488 0.725584 0.696309 0.597805 1 0.361214 0.403955 -0.361469 0.0916122 -0.0116312 -0.0876353 0.059864 -0.0416406 0.0290792 0.0167451 0.0476608 -0.0910243 -0.0623734 0.00915046 0.156478 0.0702791 -0.00389246 -0.0160175 -0.0404734 -0.0153829 0.113771 -0.0274253 0.0666706 -0.00778934 -0.0108875 -0.0105891 -0.036195 -0.0262104 0.050379 -0.0283553 -0.0189474 -0.0218082 -0.0137677 0.139849 0.0552011 0.00651915 -0.0308169 -0.0361031 -0.0148414 -0.0255839 -0.0345764 0.000101471 -0.0263405 -0.0118621 0.00503929 0.0348472 -0.0179547 -0.00492895 0.0996807 -0.0223197 -0.048913 0.00795205 0.00815137 0.0179039 -0.0464502 -0.0395314 -0.0522864 -0.0350672 -0.0180433 -0.00990572 0.0270333 -0.00912228 -0.0398026 -0.027115 -0.00477002
act_composite_75_pctl 0.217633 0.443166 0.424454 0.368181 0.361214 1 0.933248 -0.292286 0.0565838 0.0245689 -0.063812 0.0145541 -0.148956 0.0123209 -0.00373488 0.0652675 -0.103051 -0.0684499 -0.0180321 0.325621 -0.0415376 -0.0306791 -0.0296713 0.00363061 0.0152543 0.0533357 0.0269821 0.0344512 -0.0243341 -0.0171607 -0.0802817 0.0195911 0.000376382 0.0289522 0.0320501 -0.0263222 -0.0195456 -0.0416538 0.0661128 -0.0254216 0.0157977 0.024032 0.054349 0.0419179 -0.0557801 -0.00335501 -0.1082 -0.0154574 -0.0024092 -0.0168065 -0.00576501 -0.0174409 -0.00877426 0.104802 0.014306 -0.0323995 0.0336029 0.00214309 0.0433207 -0.0862662 -0.0109916 -0.0419396 -0.0513269 0.0387288 -0.0100228 0.017815 0.0632834 -0.00238835 -0.0558444 0.0100941
act_composite_25_pctl 0.225483 0.495598 0.472639 0.390109 0.403955 0.933248 1 -0.377148 0.0470079 0.0182195 -0.0523444 -0.00982511 -0.124736 -0.00213835 -0.00126725 0.066832 -0.104017 -0.0591389 0.00905012 0.332378 -0.0699745 -0.0233072 -0.0431571 -0.0169955 0.00270699 0.0419244 0.0309314 0.0512877 -0.0171171 -0.00294737 -0.0552827 0.0147834 -0.012129 0.0177359 0.0126509 -0.0435481 -0.029201 -0.0212461 0.0984278 -0.0177181 0.0129469 0.00755705 0.0424692 0.0273302 -0.0601957 -0.010757 -0.0827145 -0.0213143 -0.0148914 -0.00799273 -0.00383764 -0.0329125 -0.0122033 0.137936 0.0161522 -0.0427766 0.0152663 0.010394 0.0498508 -0.0711365 -0.0188537 -0.0461261 -0.0525898 0.0252195 -0.0104804 0.0207887 0.0445013 0.00622965 -0.0641102 0.00559766
acceptance_rate -0.0289656 -0.375067 -0.354304 -0.327938 -0.361469 -0.292286 -0.377148 1 -0.0279218 0.00378871 0.0266354 -0.0254722 0.0160471 -0.0828033 -0.0114088 -0.018225 0.114357 0.0453637 -0.00576915 -0.135877 0.0240352 -0.00461198 0.00551498 0.0295846 0.0345737 -0.1141 0.0235771 -0.0652013 -0.0283426 -0.0757365 -0.0618584 0.0451357 0.0675054 0.00736568 0.0639153 0.0429855 0.00397616 -0.0172436 -0.0841033 -0.0362699 0.0110364 0.0337485 0.0478034 0.0243775 -0.0276786 0.0579306 -0.0605069 0.0404728 0.0120724 0.00889684 -0.0116048 -0.0266192 0.0321896 -0.107467 0.0312897 0.0422753 0.0198869 0.0155512 -0.0502495 0.00622307 0.0808538 0.026157 -0.0039304 0.0489399 -0.00421751 0.00730904 0.0547472 0.0838889 0.0391061 0.0464942
OrgStructure_Independent -0.237285 0.0122851 -0.00430665 0.0827803 0.0916122 0.0565838 0.0470079 -0.0279218 1 -0.137812 -0.953281 -0.0320998 -0.0672962 -0.107951 -0.022151 -0.0138155 -0.232312 0.440608 -0.0328993 -0.172132 -0.0112242 0.00226677 0.00290429 -0.00356936 -0.0222806 -0.0523269 -0.0132715 0.0158803 -0.00426561 -0.00994111 -0.0291415 0.0326028 0.0151128 0.0379659 0.0120593 0.0114684 0.0365461 -0.0572418 0.0275677 -0.0181331 -0.0208603 0.0329614 -0.0111595 0.0286039 -0.0339748 -0.0305239 -0.00819061 -0.0373167 -0.000162875 -0.00500688 0.0228353 -0.00805188 -0.0254609 0.0112598 0.0459644 -0.025896 0.00972186 0.021692 0.0184451 -0.00112487 -0.0451261 -0.0154323 -0.0372856 -0.0219074 0.0432237 -0.00323173 0.0246443 -0.0212315 -0.00399634 0.0153538
OrgStructure_MultiCampus 0.0678804 0.0141928 0.00433523 -0.000455343 -0.0116312 0.0245689 0.0182195 0.00378871 -0.137812 1 -0.16783 -0.0037635 0.000549887 0.0361006 -0.00761738 -0.0074798 0.0132542 -0.0592717 -0.00251208 0.0315273 0.00335658 -0.00287615 -0.00560256 0.0480255 0.000504609 0.020885 -0.0178427 -0.00781834 -0.00690236 -0.000258661 -0.0280253 -0.023993 -0.0121522 -0.0322706 -0.0278676 -0.0204573 -0.0222656 -0.0204049 0.0978044 -0.00329625 0.0289822 0.0366985 0.0131907 0.0388539 -0.0177831 -0.0112591 -0.0313181 -0.0120658 0.0684533 -0.0108818 -0.00577854 0.0909665 -0.0080161 -0.0406115 0.0903185 -0.019928 -0.0172997 0.0308315 -0.0109774 -0.0260368 -0.0149413 -0.0294107 -0.0357091 -0.0102906 -0.00417943 -0.0118029 -0.0125821 -0.0262458 9.64608e-05 -0.00454876
OrgStructure_MultiOrg 0.215469 -0.0165563 0.00296424 -0.0822535 -0.0876353 -0.063812 -0.0523444 0.0266354 -0.953281 -0.16783 1 0.0330972 0.066813 0.0964344 0.0243705 0.0160321 0.22718 -0.420465 0.0335113 0.161709 0.0101479 -0.00137893 -0.00118192 -0.011095 0.0220222 0.0457118 0.0186512 -0.0134213 0.00635081 0.0099734 0.0375525 -0.0251322 -0.0113356 -0.0279455 -0.00350322 -0.00517528 -0.0295839 0.0631969 -0.0572684 0.0190535 0.0119231 -0.0439998 0.00708404 -0.0403201 0.0392393 0.0338148 0.0177041 0.0408218 -0.0207159 0.00830233 -0.0209658 -0.0197303 0.0277865 0.00117937 -0.0732957 0.0318526 -0.00439996 -0.0309939 -0.0150106 0.00906071 0.0494716 0.0243301 0.048002 0.0249433 -0.0417464 0.00681642 -0.0206913 0.0291368 0.00394819 -0.0138945
Carnegie_Art_Music_Design -0.0533886 0.0136862 0.0170575 0.0184627 0.059864 0.0145541 -0.00982511 -0.0254722 -0.0320998 -0.0037635 0.0330972 1 -0.0116855 -0.0503156 -0.00480031 -0.00471361 -0.0594038 -0.0766571 -0.00919108 -0.0313293 -0.0122508 -0.00181249 -0.0137856 -0.0117216 -0.0057404 0.0187316 -0.0112441 -0.00886743 -0.00434972 -0.0169805 0.0125121 -0.0151199 -0.00765805 0.0326066 -0.0175616 -0.0128917 -0.0140313 -0.0128587 -0.00121727 -0.00367891 -0.00797712 0.00923507 0.0279378 0.00719618 -0.0112065 -0.00709525 0.00747519 -0.0076036 -0.0110171 -0.0068575 -0.0103887 -0.00703655 -0.00505157 0.0138975 0.0563606 -0.0125582 -0.0109019 -0.000326289 0.0555715 -0.0164078 -0.00941568 0.039152 -0.0225031 -0.00648488 -0.0171846 -0.00743792 -0.0110932 -0.0165395 -0.0118292 -0.00286653
Carnegie_Associate -0.0650903 -0.0381477 -0.0363283 -0.0367329 -0.0416406 -0.148956 -0.124736 0.0160471 -0.0672962 0.000549887 0.066813 -0.0116855 1 -0.0610224 -0.00582177 -0.00571662 -0.0720444 -0.0929691 -0.0111469 -0.037996 -0.0148577 -0.00219817 0.0201753 0.010446 -0.00696192 -0.0304349 -0.0136367 -0.0107544 -0.0052753 -0.0205939 0.078544 -0.0183372 -0.00928762 -0.0246636 -0.0212985 -0.015635 -0.017017 -0.015595 -0.000934524 -0.0137724 -0.00967459 -0.0205316 -0.00581281 0.0115354 0.00573197 -0.00860507 -0.00515431 -0.00922159 -0.0133614 -0.00831672 -0.0125993 0.0525354 -0.00612651 -0.0191447 0.049298 -0.0152305 -0.0132217 0.0137634 -0.00838972 0.122948 -0.0114193 0.0770604 -0.0272916 -0.00786481 -0.0165679 -0.00902066 -0.0134538 -0.020059 -0.0143464 -0.0034765
Carnegie_Bachelors -0.260792 -0.0708079 -0.0656239 0.171808 0.0290792 0.0123209 -0.00213835 -0.0828033 -0.107951 0.0361006 0.0964344 -0.0503156 -0.0610224 1 -0.0250674 -0.0246146 -0.310209 -0.400306 -0.0479962 -0.163603 -0.0639741 -0.00946489 -0.014073 0.0297153 -0.00476533 -0.0505726 0.00440397 -0.020538 0.0010167 -0.0124403 0.0270121 0.0717102 0.0168546 -0.0269466 0.029374 0.00909247 0.0271247 -0.0101041 -0.00549498 -0.0280437 0.0389425 -0.00568785 0.0201908 -0.0306482 -0.0156822 0.0330073 0.0392297 0.0120889 0.0182193 -0.0327909 -0.0402114 -0.0338019 -0.0263795 -0.0218687 -0.00753721 -0.00888255 0.0119483 0.0313743 -0.0361245 0.0253306 0.0369595 -0.0334617 -0.0529416 -0.0338643 0.0461932 0.0308106 -0.0221885 -0.0210325 0.0619245 -0.0149691
Carnegie_Business_Management -0.0196449 -0.00784476 0.00948298 0.0285924 0.0167451 -0.00373488 -0.00126725 -0.0114088 -0.022151 -0.00761738 0.0243705 -0.00480031 -0.00582177 -0.0250674 1 -0.00234833 -0.0295951 -0.0381908 -0.00457903 -0.0156084 -0.00610338 -0.000902987 -0.00686802 -0.00583975 -0.00285989 -0.0125024 -0.00560183 -0.00441778 -0.00216704 -0.00845975 -0.00879872 -0.00753276 -0.00381526 -0.0101316 0.0617734 -0.0064227 -0.00699043 -0.00640627 0.0541417 -0.00565758 -0.00397422 0.0854186 -0.00800157 -0.00886024 -0.00558313 -0.00353488 -0.00983254 -0.00378814 -0.00548874 -0.00341643 -0.00517569 -0.00350563 -0.00251671 0.0230207 -0.0104341 -0.00625654 -0.00543135 -0.0127138 -0.00344641 -0.00817442 -0.00469092 -0.00923369 -0.0112111 -0.00323079 -0.00856142 -0.0037056 -0.00552668 -0.00824004 -0.00589336 -0.00142811
Carnegie_Engineering -0.0284471 0.0118131 0.0167767 0.123081 0.0476608 0.0652675 0.066832 -0.018225 -0.0138155 -0.0074798 0.0160321 -0.00471361 -0.00571662 -0.0246146 -0.00234833 1 -0.0290606 -0.037501 -0.00449632 -0.0153265 -0.00599314 -0.000886678 -0.00674398 -0.00573427 -0.00280823 -0.0122765 -0.00550066 -0.00433799 -0.0021279 -0.00830696 -0.0086398 -0.00739671 -0.00374635 -0.00994859 0.0427043 -0.0063067 -0.00686417 -0.00629056 0.0554868 0.119077 -0.00390244 -0.00828183 -0.00785705 0.00143837 -0.0054823 -0.00347103 -0.00965495 -0.00371972 -0.0053896 -0.00335472 -0.00508221 -0.00344231 -0.00247125 -0.00523481 -0.0102456 -0.00614353 -0.00533325 -0.0124841 -0.00338417 -0.00802678 0.0889527 -0.00906691 -0.0110086 -0.00317243 -0.00840679 -0.00363867 -0.00542686 -0.00809121 -0.00578692 -0.00140232
Carnegie_Graduate_Professional -0.0131926 -0.0700728 -0.0793915 -0.0295598 -0.0910243 -0.103051 -0.104017 0.114357 -0.232312 0.0132542 0.22718 -0.0594038 -0.0720444 -0.310209 -0.0295951 -0.0290606 1 -0.472611 -0.0566654 -0.193153 -0.0755293 -0.0111745 -0.00118798 -0.00854606 -0.0221886 0.0504968 -0.0165809 0.0312011 0.00362983 -0.000469391 -0.0338279 -0.0497498 -0.022407 0.0130971 -0.0139539 0.00246624 0.00138001 0.00586002 -0.0169086 0.0411989 -0.0158233 0.0139276 -3.33262e-05 -0.015339 0.000891555 -0.0437441 -0.0221354 -0.0144011 -0.033213 0.0102941 0.0701205 -0.00831791 -0.0124052 0.017143 -0.0339165 0.0188516 -0.00935111 0.0117135 0.0039838 -0.0266315 -0.0337638 -0.00558988 0.0643979 -0.007812 -0.0142174 -0.0254309 0.0367645 0.0459613 -0.0421613 -0.0176729
Carnegie_Missing -0.0284104 -0.0542125 -0.0518648 -0.147154 -0.0623734 -0.0684499 -0.0591389 0.0453637 0.440608 -0.0592717 -0.420465 -0.0766571 -0.0929691 -0.400306 -0.0381908 -0.037501 -0.472611 1 -0.0731234 -0.249253 -0.0974661 -0.01442 0.00510739 -0.000757399 0.00556547 0.00237104 -0.00245897 -0.00389948 -0.0147476 -0.0116944 0.0076487 0.00578686 0.00023277 0.00532571 0.00415854 -0.00324892 -0.00881014 -0.00819579 -0.00306409 -0.0134158 -0.0112488 -0.000445563 0.00890269 0.0077049 0.00124067 -0.000266005 0.0120448 -0.0103073 0.00109375 0.0161863 -0.0239126 0.0105165 -0.00254565 -0.00287028 0.00724361 0.00334945 -0.00347959 -0.0190958 0.00507875 0.00124421 -0.00468526 0.0240389 -0.00564278 0.00181438 -0.0126294 0.0107666 -0.00325446 0.00563653 0.00644108 0.00729417
Carnegie_Other_Tech_Health -0.0569322 0.00676797 -0.00598595 -0.00699845 0.00915046 -0.0180321 0.00905012 -0.00576915 -0.0328993 -0.00251208 0.0335113 -0.00919108 -0.0111469 -0.0479962 -0.00457903 -0.00449632 -0.0566654 -0.0731234 1 -0.0298851 -0.0116861 -0.00172894 -0.0131501 -0.0111813 -0.00547579 -0.0239381 -0.0107258 -0.00845867 -0.00414921 0.0111292 -0.0168468 -0.00222082 -0.00730504 -0.00553543 -0.016752 0.00192296 -0.0133845 0.0447567 0.0341233 -0.0108325 -0.0076094 -0.0161488 -0.0153205 0.0144337 -0.01069 -0.00676818 0.00492513 -0.0072531 0.163414 -0.00654139 -0.00990982 -0.00671219 -0.00481871 -0.00185129 0.0160086 -0.0119793 -0.0103993 -0.00172487 -0.00659881 -0.0156515 -0.00898165 0.0226015 -0.0214658 0.00776989 0.010629 -0.00709506 -0.0105819 0.0122367 -0.0112839 -0.00273439
Carnegie_Research_Universities 0.541407 0.31973 0.30789 0.0369432 0.156478 0.325621 0.332378 -0.135877 -0.172132 0.0315273 0.161709 -0.0313293 -0.037996 -0.163603 -0.0156084 -0.0153265 -0.193153 -0.249253 -0.0298851 1 -0.0398338 -0.00589337 0.0213882 -0.0206236 0.0419028 0.00298562 0.0440925 0.00399589 0.025765 0.0271012 -0.0169157 -0.0296123 0.00544748 0.000512912 -0.0146811 -0.0168551 -0.016224 0.0244278 0.0164051 -0.00342666 0.00321115 -0.00762394 -0.0411466 0.000865173 0.0314267 0.0342019 -0.0306838 0.0211205 -0.0145908 0.0157839 0.00837955 0.0307409 0.0695422 0.0143865 0.01839 -0.0104534 0.0101341 -0.0274099 0.0278446 -0.0334516 0.0096183 -0.0328346 0.0173199 0.0638839 -0.000459358 -0.00856702 0.000836497 -0.0286435 -0.0236046 0.041088
Carnegie_Theological -0.0868782 -0.0331109 -0.00265617 -0.00443732 0.0702791 -0.0415376 -0.0699745 0.0240352 -0.0112242 0.00335658 0.0101479 -0.0122508 -0.0148577 -0.0639741 -0.00610338 -0.00599314 -0.0755293 -0.0974661 -0.0116861 -0.0398338 1 -0.0023045 -0.0124934 -0.00312522 -0.00729867 0.0114334 -0.0142963 -0.0112745 -0.00553047 0.0568431 -0.0224551 0.0268574 0.0349704 0.0299894 -0.0223288 0.0212016 0.0217574 -0.0163493 -0.0139674 -0.0144386 -0.0101426 -0.0173848 -0.00301639 0.0999175 -0.0142486 -0.0090213 -0.0215055 0.0263512 -0.00775221 -0.00871901 -0.0132088 -0.00894666 -0.00642285 -0.00981854 -0.0198334 0.0115741 0.0177385 -0.0068212 -0.00879553 0.000457782 -0.0119716 -0.000746618 -0.00632778 -0.00824524 -0.0218495 -0.00945699 -0.0141045 -0.0210293 0.0141451 -0.00364466
Carnegie_Tribal_Colleges -0.0119945 -0.00694348 0.000149355 0.0245916 -0.00389246 -0.0306791 -0.0233072 -0.00461198 0.00226677 -0.00287615 -0.00137893 -0.00181249 -0.00219817 -0.00946489 -0.000902987 -0.000886678 -0.0111745 -0.01442 -0.00172894 -0.00589337 -0.0023045 1 -0.00259321 -0.00220496 -0.00107983 -0.00472061 -0.00211513 -0.00166806 -0.000818227 -0.00319421 -0.0033222 -0.0028442 -0.00144056 -0.00382546 -0.00330351 0.140593 -0.00263943 -0.00241886 -0.00367426 -0.00213617 -0.00150058 -0.00318455 -0.00302121 -0.00334543 -0.00210807 -0.00133469 -0.00371255 -0.00143032 -0.00207242 -0.00128997 -0.00195422 -0.00132365 -0.000950253 -0.00481421 -0.00393968 -0.00236233 -0.00205076 -0.00480043 -0.00130129 -0.00308648 -0.00177119 -0.00348643 -0.00423307 -0.00121987 -0.0032326 -0.00139915 -0.00208675 -0.00311125 -0.0022252 -0.000539224
US_AL 0.0164678 -0.0228034 -0.0115194 -0.0212787 -0.0160175 -0.0296713 -0.0431571 0.00551498 0.00290429 -0.00560256 -0.00118192 -0.0137856 0.0201753 -0.014073 -0.00686802 -0.00674398 -0.00118798 0.00510739 -0.0131501 0.0213882 -0.0124934 -0.00259321 1 -0.0167707 -0.00821307 -0.0359045 -0.0160874 -0.0126871 -0.00622335 -0.0242948 -0.0252683 -0.0216327 -0.0109567 -0.029096 -0.0251262 -0.0184448 -0.0200752 -0.0183976 -0.027946 -0.0162475 -0.0114132 -0.0242214 -0.022979 -0.025445 -0.0160337 -0.0101515 -0.0282372 -0.0108788 -0.0157626 -0.00981135 -0.0148636 -0.0100675 -0.00722752 -0.0366164 -0.0299648 -0.0179676 -0.0155978 -0.0365116 -0.00989746 -0.0234754 -0.0134715 -0.0265174 -0.0321963 -0.00927822 -0.0245868 -0.0106418 -0.0158716 -0.0236639 -0.0169246 -0.00410128
US_AR -0.0168913 -0.0342289 -0.0338901 -0.0449249 -0.0404734 0.00363061 -0.0169955 0.0295846 -0.00356936 0.0480255 -0.011095 -0.0117216 0.010446 0.0297153 -0.00583975 -0.00573427 -0.00854606 -0.000757399 -0.0111813 -0.0206236 -0.00312522 -0.00220496 -0.0167707 1 -0.00698341 -0.0305289 -0.0136788 -0.0107876 -0.00529159 -0.0206574 -0.0214852 -0.0183939 -0.0093163 -0.0247398 -0.0213643 -0.0156833 -0.0170696 -0.0156431 -0.023762 -0.013815 -0.00970446 -0.020595 -0.0195386 -0.0216354 -0.0136332 -0.00863163 -0.0240096 -0.00925006 -0.0134027 -0.0083424 -0.0126382 -0.00856022 -0.00614542 -0.0311342 -0.0254785 -0.0152775 -0.0132625 -0.0310451 -0.00841562 -0.0199607 -0.0114545 -0.0225473 -0.0273759 -0.0078891 -0.0209057 -0.00904851 -0.0134953 -0.0201209 -0.0143907 -0.00348724
US_AZ 0.112321 -0.0049527 0.000897928 -0.0242059 -0.0153829 0.0152543 0.00270699 0.0345737 -0.0222806 0.000504609 0.0220222 -0.0057404 -0.00696192 -0.00476533 -0.00285989 -0.00280823 -0.0221886 0.00556547 -0.00547579 0.0419028 -0.00729867 -0.00107983 -0.00821307 -0.00698341 1 -0.0149508 -0.00669891 -0.00528297 -0.00259144 -0.0101165 -0.0105219 -0.00900799 -0.00456245 -0.0121158 -0.0104627 -0.00768053 -0.00835945 -0.00766088 -0.0116369 -0.00676557 -0.00475254 -0.0100859 -0.00956861 -0.0105954 -0.00667654 -0.00422715 -0.0117582 -0.00453001 -0.00656366 -0.00408551 -0.0061893 -0.00419218 -0.00300958 -0.0152473 -0.0124775 -0.00748183 -0.00649503 -0.0152036 -0.00412136 -0.00977531 -0.0056096 -0.011042 -0.0134067 -0.00386351 -0.0102381 -0.00443131 -0.00660903 -0.00985378 -0.00704753 -0.0017078
US_CA 0.120395 0.092938 0.0682382 0.0935616 0.113771 0.0533357 0.0419244 -0.1141 -0.0523269 0.020885 0.0457118 0.0187316 -0.0304349 -0.0505726 -0.0125024 -0.0122765 0.0504968 0.00237104 -0.0239381 0.00298562 0.0114334 -0.00472061 -0.0359045 -0.0305289 -0.0149508 1 -0.0292851 -0.0230952 -0.0113288 -0.0442257 -0.0459977 -0.0393795 -0.0199453 -0.0529656 -0.045739 -0.0335764 -0.0365444 -0.0334905 -0.0508722 -0.0295765 -0.0207763 -0.0440919 -0.0418304 -0.0463193 -0.0291874 -0.0184795 -0.0514023 -0.0198035 -0.0286939 -0.0178603 -0.0270573 -0.0183266 -0.0131568 -0.0666554 -0.0545471 -0.0327077 -0.0283939 -0.0664646 -0.0180171 -0.042734 -0.0245231 -0.0482716 -0.0586092 -0.0168898 -0.0447571 -0.019372 -0.0288922 -0.043077 -0.0308092 -0.00746585
US_CO 0.0482039 0.00750358 0.0233892 -0.03048 -0.0274253 0.0269821 0.0309314 0.0235771 -0.0132715 -0.0178427 0.0186512 -0.0112441 -0.0136367 0.00440397 -0.00560183 -0.00550066 -0.0165809 -0.00245897 -0.0107258 0.0440925 -0.0142963 -0.00211513 -0.0160874 -0.0136788 -0.00669891 -0.0292851 1 -0.0103481 -0.00507601 -0.0198158 -0.0206098 -0.0176445 -0.00893674 -0.0237319 -0.0204939 -0.0150443 -0.0163741 -0.0150058 -0.0227939 -0.0132521 -0.00930909 -0.0197559 -0.0187426 -0.0207539 -0.0130777 -0.00827997 -0.0230314 -0.00887321 -0.0128566 -0.00800253 -0.0121234 -0.00821147 -0.00589505 -0.0298658 -0.0244405 -0.0146551 -0.0127222 -0.0297803 -0.00807276 -0.0191475 -0.0109879 -0.0216287 -0.0262606 -0.00756769 -0.020054 -0.00867987 -0.0129455 -0.0193012 -0.0138044 -0.00334517
US_CT 0.00338628 0.115846 0.0844693 0.160667 0.0666706 0.0344512 0.0512877 -0.0652013 0.0158803 -0.00781834 -0.0134213 -0.00886743 -0.0107544 -0.020538 -0.00441778 -0.00433799 0.0312011 -0.00389948 -0.00845867 0.00399589 -0.0112745 -0.00166806 -0.0126871 -0.0107876 -0.00528297 -0.0230952 -0.0103481 1 -0.0040031 -0.0156274 -0.0162536 -0.013915 -0.0070478 -0.0187157 -0.0161621 -0.0118644 -0.0129132 -0.0118341 -0.017976 -0.010451 -0.00734144 -0.0155801 -0.014781 -0.0163672 -0.0103135 -0.00652985 -0.0181633 -0.00699769 -0.0101391 -0.00631104 -0.00956086 -0.00647583 -0.00464902 -0.0235531 -0.0192745 -0.0115575 -0.0100331 -0.0234857 -0.00636644 -0.0151003 -0.00866537 -0.0170571 -0.0207099 -0.00596812 -0.0158152 -0.00684522 -0.0102092 -0.0152215 -0.0108866 -0.0026381
US_DE 0.0221385 0.00207524 -0.010923 -0.00635275 -0.00778934 -0.0243341 -0.0171171 -0.0283426 -0.00426561 -0.00690236 0.00635081 -0.00434972 -0.0052753 0.0010167 -0.00216704 -0.0021279 0.00362983 -0.0147476 -0.00414921 0.025765 -0.00553047 -0.000818227 -0.00622335 -0.00529159 -0.00259144 -0.0113288 -0.00507601 -0.0040031 1 -0.00766567 -0.00797282 -0.00682568 -0.00345714 -0.00918056 -0.00792797 -0.00581983 -0.00633426 -0.00580493 -0.00881771 -0.00512652 -0.00360118 -0.00764248 -0.00725049 -0.00802856 -0.00505907 -0.00320307 -0.00890959 -0.00343256 -0.00497353 -0.00309574 -0.00468986 -0.00317657 -0.00228047 -0.0115534 -0.00945469 -0.00566926 -0.00492153 -0.0115204 -0.00312291 -0.00740712 -0.0042506 -0.00836695 -0.0101588 -0.00292752 -0.00775779 -0.00335777 -0.00500791 -0.00746658 -0.00534017 -0.00129406
US_FL 0.0824277 -0.0134509 -0.0121219 -0.0264442 -0.0108875 -0.0171607 -0.00294737 -0.0757365 -0.00994111 -0.000258661 0.0099734 -0.0169805 -0.0205939 -0.0124403 -0.00845975 -0.00830696 -0.000469391 -0.0116944 0.0111292 0.0271012 0.0568431 -0.00319421 -0.0242948 -0.0206574 -0.0101165 -0.0442257 -0.0198158 -0.0156274 -0.00766567 1 -0.0311245 -0.0266463 -0.0134961 -0.0358393 -0.0309494 -0.0227196 -0.0247278 -0.0226614 -0.0344228 -0.020013 -0.0140584 -0.0298349 -0.0283046 -0.0313421 -0.0197497 -0.0125042 -0.0347815 -0.0134001 -0.0194158 -0.0120852 -0.0183084 -0.0124008 -0.00890257 -0.0451026 -0.0369094 -0.0221318 -0.0192128 -0.0449735 -0.0121913 -0.0289161 -0.0165936 -0.0326631 -0.0396581 -0.0114285 -0.030285 -0.0131081 -0.01955 -0.0291482 -0.0208471 -0.00505179
US_GA -0.00897759 -0.0230337 -0.0220498 -0.0230657 -0.0105891 -0.0802817 -0.0552827 -0.0618584 -0.0291415 -0.0280253 0.0375525 0.0125121 0.078544 0.0270121 -0.00879872 -0.0086398 -0.0338279 0.0076487 -0.0168468 -0.0169157 -0.0224551 -0.0033222 -0.0252683 -0.0214852 -0.0105219 -0.0459977 -0.0206098 -0.0162536 -0.00797282 -0.0311245 1 -0.0277139 -0.0140368 -0.0372753 -0.0321895 -0.0236299 -0.0257186 -0.0235694 -0.0358021 -0.0208149 -0.0146217 -0.0310303 -0.0294388 -0.0325979 -0.020541 -0.0130052 -0.0361751 -0.013937 -0.0201937 -0.0125695 -0.019042 -0.0128976 -0.00925928 -0.0469097 -0.0383883 -0.0230186 -0.0199826 -0.0467755 -0.0126798 -0.0300747 -0.0172585 -0.0339719 -0.0412471 -0.0118865 -0.0314985 -0.0136333 -0.0203333 -0.0303161 -0.0216824 -0.0052542
US_IA -0.0426974 -0.0306683 -0.0234397 0.0131243 -0.036195 0.0195911 0.0147834 0.0451357 0.0326028 -0.023993 -0.0251322 -0.0151199 -0.0183372 0.0717102 -0.00753276 -0.00739671 -0.0497498 0.00578686 -0.00222082 -0.0296123 0.0268574 -0.0028442 -0.0216327 -0.0183939 -0.00900799 -0.0393795 -0.0176445 -0.013915 -0.00682568 -0.0266463 -0.0277139 1 -0.0120172 -0.0319121 -0.0275581 -0.02023 -0.0220182 -0.0201783 -0.0306508 -0.0178201 -0.0125179 -0.0265657 -0.0252031 -0.0279077 -0.0175856 -0.011134 -0.0309702 -0.0119318 -0.0172883 -0.010761 -0.0163022 -0.0110419 -0.00792705 -0.0401604 -0.032865 -0.0197066 -0.0171075 -0.0400454 -0.0108554 -0.0257475 -0.0147753 -0.029084 -0.0353124 -0.0101762 -0.0269665 -0.0116718 -0.0174078 -0.0259542 -0.0185627 -0.00449823
US_ID 0.0319785 -0.0228689 -0.0214807 -0.0443549 -0.0262104 0.000376382 -0.012129 0.0675054 0.0151128 -0.0121522 -0.0113356 -0.00765805 -0.00928762 0.0168546 -0.00381526 -0.00374635 -0.022407 0.00023277 -0.00730504 0.00544748 0.0349704 -0.00144056 -0.0109567 -0.0093163 -0.00456245 -0.0199453 -0.00893674 -0.0070478 -0.00345714 -0.0134961 -0.0140368 -0.0120172 1 -0.0161632 -0.0139579 -0.0102463 -0.011152 -0.0102201 -0.0155243 -0.00902567 -0.00634018 -0.0134552 -0.0127651 -0.014135 -0.00890691 -0.00563928 -0.0156861 -0.00604331 -0.00875632 -0.00545031 -0.0082569 -0.00559262 -0.00401497 -0.0203408 -0.0166458 -0.00998121 -0.00866477 -0.0202826 -0.00549815 -0.0130409 -0.00748355 -0.0147307 -0.0178854 -0.00515416 -0.0136582 -0.00591163 -0.00881685 -0.0131455 -0.00940183 -0.0022783
US_IL -0.0217072 0.0656537 0.0320517 0.056317 0.050379 0.0289522 0.0177359 0.00736568 0.0379659 -0.0322706 -0.0279455 0.0326066 -0.0246636 -0.0269466 -0.0101316 -0.00994859 0.0130971 0.00532571 -0.00553543 0.000512912 0.0299894 -0.00382546 -0.029096 -0.0247398 -0.0121158 -0.0529656 -0.0237319 -0.0187157 -0.00918056 -0.0358393 -0.0372753 -0.0319121 -0.0161632 1 -0.0370657 -0.0272094 -0.0296146 -0.0271398 -0.0412255 -0.023968 -0.0168366 -0.0357309 -0.0338982 -0.0375359 -0.0236527 -0.0149753 -0.041655 -0.0160482 -0.0232527 -0.0144735 -0.0219265 -0.0148514 -0.0106619 -0.0540158 -0.0442035 -0.0265055 -0.0230096 -0.0538612 -0.0146005 -0.0346305 -0.0198728 -0.039118 -0.0474953 -0.0136871 -0.03627 -0.0156986 -0.0234135 -0.0349085 -0.0249669 -0.00605013
US_IN 0.00268712 -0.0188506 -0.0259501 -0.022251 -0.0283553 0.0320501 0.0126509 0.0639153 0.0120593 -0.0278676 -0.00350322 -0.0175616 -0.0212985 0.029374 0.0617734 0.0427043 -0.0139539 0.00415854 -0.016752 -0.0146811 -0.0223288 -0.00330351 -0.0251262 -0.0213643 -0.0104627 -0.045739 -0.0204939 -0.0161621 -0.00792797 -0.0309494 -0.0321895 -0.0275581 -0.0139579 -0.0370657 1 -0.023497 -0.025574 -0.0234369 -0.0356007 -0.0206978 -0.0145394 -0.0308558 -0.0292732 -0.0324146 -0.0204255 -0.0129321 -0.0359716 -0.0138586 -0.0200802 -0.0124988 -0.0189349 -0.0128251 -0.0092072 -0.0466459 -0.0381724 -0.0228891 -0.0198702 -0.0465124 -0.0126085 -0.0299055 -0.0171614 -0.0337808 -0.0410151 -0.0118196 -0.0313213 -0.0135567 -0.020219 -0.0301456 -0.0215604 -0.00522465
US_KS -0.0429346 -0.0377986 -0.0323062 0.0137563 -0.0189474 -0.0263222 -0.0435481 0.0429855 0.0114684 -0.0204573 -0.00517528 -0.0128917 -0.015635 0.00909247 -0.0064227 -0.0063067 0.00246624 -0.00324892 0.00192296 -0.0168551 0.0212016 0.140593 -0.0184448 -0.0156833 -0.00768053 -0.0335764 -0.0150443 -0.0118644 -0.00581983 -0.0227196 -0.0236299 -0.02023 -0.0102463 -0.0272094 -0.023497 1 -0.0187735 -0.0172047 -0.026134 -0.015194 -0.0106732 -0.0226509 -0.0214891 -0.0237951 -0.0149941 -0.00949329 -0.0264063 -0.0101734 -0.0147406 -0.00917518 -0.0138999 -0.00941475 -0.00675889 -0.0342422 -0.0280219 -0.0168026 -0.0145865 -0.0341442 -0.00925571 -0.0219533 -0.012598 -0.024798 -0.0301087 -0.00867662 -0.0229926 -0.00995178 -0.0148425 -0.0221295 -0.0158273 -0.00383535
US_KY -0.0157403 -0.0312729 -0.0272824 -0.0136057 -0.0218082 -0.0195456 -0.029201 0.00397616 0.0365461 -0.0222656 -0.0295839 -0.0140313 -0.017017 0.0271247 -0.00699043 -0.00686417 0.00138001 -0.00881014 -0.0133845 -0.016224 0.0217574 -0.00263943 -0.0200752 -0.0170696 -0.00835945 -0.0365444 -0.0163741 -0.0129132 -0.00633426 -0.0247278 -0.0257186 -0.0220182 -0.011152 -0.0296146 -0.025574 -0.0187735 1 -0.0187255 -0.0284441 -0.0165371 -0.0116166 -0.0246531 -0.0233886 -0.0258985 -0.0163195 -0.0103324 -0.0287405 -0.0110727 -0.0160436 -0.00998621 -0.0151285 -0.010247 -0.00735633 -0.037269 -0.0304988 -0.0182878 -0.0158758 -0.0371623 -0.0100739 -0.0238938 -0.0137116 -0.02699 -0.0327701 -0.00944358 -0.025025 -0.0108315 -0.0161545 -0.0240856 -0.0172263 -0.00417437
US_LA 0.0083698 -0.0209788 -0.0254707 -0.0539766 -0.0137677 -0.0416538 -0.0212461 -0.0172436 -0.0572418 -0.0204049 0.0631969 -0.0128587 -0.015595 -0.0101041 -0.00640627 -0.00629056 0.00586002 -0.00819579 0.0447567 0.0244278 -0.0163493 -0.00241886 -0.0183976 -0.0156431 -0.00766088 -0.0334905 -0.0150058 -0.0118341 -0.00580493 -0.0226614 -0.0235694 -0.0201783 -0.0102201 -0.0271398 -0.0234369 -0.0172047 -0.0187255 1 -0.0260671 -0.0151551 -0.0106459 -0.0225929 -0.0214341 -0.0237342 -0.0149557 -0.00946899 -0.0263388 -0.0101474 -0.0147029 -0.0091517 -0.0138643 -0.00939066 -0.00674159 -0.0341545 -0.0279502 -0.0167596 -0.0145491 -0.0340568 -0.00923203 -0.0218971 -0.0125657 -0.0247346 -0.0300316 -0.00865442 -0.0229338 -0.00992631 -0.0148045 -0.0220729 -0.0157868 -0.00382554
US_MA -0.034833 0.0684658 0.147776 0.133448 0.139849 0.0661128 0.0984278 -0.0841033 0.0275677 0.0978044 -0.0572684 -0.00121727 -0.000934524 -0.00549498 0.0541417 0.0554868 -0.0169086 -0.00306409 0.0341233 0.0164051 -0.0139674 -0.00367426 -0.027946 -0.023762 -0.0116369 -0.0508722 -0.0227939 -0.017976 -0.00881771 -0.0344228 -0.0358021 -0.0306508 -0.0155243 -0.0412255 -0.0356007 -0.026134 -0.0284441 -0.0260671 1 -0.0230207 -0.0161711 -0.0343187 -0.0325584 -0.0360524 -0.0227178 -0.0143834 -0.0400086 -0.015414 -0.0223337 -0.0139015 -0.0210599 -0.0142644 -0.0102405 -0.0518809 -0.0424564 -0.0254579 -0.0221002 -0.0517324 -0.0140235 -0.0332618 -0.0190874 -0.0375719 -0.0456181 -0.0131461 -0.0348365 -0.0150781 -0.0224881 -0.0335288 -0.0239801 -0.005811
US_MD -0.0120453 0.0529512 -0.00786446 0.00687159 0.0552011 -0.0254216 -0.0177181 -0.0362699 -0.0181331 -0.00329625 0.0190535 -0.00367891 -0.0137724 -0.0280437 -0.00565758 0.119077 0.0411989 -0.0134158 -0.0108325 -0.00342666 -0.0144386 -0.00213617 -0.0162475 -0.013815 -0.00676557 -0.0295765 -0.0132521 -0.010451 -0.00512652 -0.020013 -0.0208149 -0.0178201 -0.00902567 -0.023968 -0.0206978 -0.015194 -0.0165371 -0.0151551 -0.0230207 1 -0.00940172 -0.0199525 -0.0189291 -0.0209605 -0.0132079 -0.00836237 -0.0232606 -0.0089615 -0.0129846 -0.00808216 -0.012244 -0.00829319 -0.00595371 -0.030163 -0.0246837 -0.0148009 -0.0128488 -0.0300766 -0.0081531 -0.019338 -0.0110972 -0.0218439 -0.0265219 -0.00764299 -0.0202535 -0.00876624 -0.0130743 -0.0194933 -0.0139418 -0.00337845
US_ME -0.0257353 -0.00124157 0.00944414 0.0347593 0.00651915 0.0157977 0.0129469 0.0110364 -0.0208603 0.0289822 0.0119231 -0.00797712 -0.00967459 0.0389425 -0.00397422 -0.00390244 -0.0158233 -0.0112488 -0.0076094 0.00321115 -0.0101426 -0.00150058 -0.0114132 -0.00970446 -0.00475254 -0.0207763 -0.00930909 -0.00734144 -0.00360118 -0.0140584 -0.0146217 -0.0125179 -0.00634018 -0.0168366 -0.0145394 -0.0106732 -0.0116166 -0.0106459 -0.0161711 -0.00940172 1 -0.0140158 -0.013297 -0.0147239 -0.00927802 -0.00587423 -0.0163396 -0.0062951 -0.00912115 -0.0056774 -0.00860092 -0.00582564 -0.00418225 -0.0211883 -0.0173393 -0.0103971 -0.00902578 -0.0211276 -0.00572723 -0.0135842 -0.00779535 -0.0153445 -0.0186306 -0.0053689 -0.0142273 -0.00615794 -0.0091842 -0.0136932 -0.00979355 -0.00237323
US_MI 0.0358332 -0.0195168 -0.014961 -0.0389462 -0.0308169 0.024032 0.00755705 0.0337485 0.0329614 0.0366985 -0.0439998 0.00923507 -0.0205316 -0.00568785 0.0854186 -0.00828183 0.0139276 -0.000445563 -0.0161488 -0.00762394 -0.0173848 -0.00318455 -0.0242214 -0.020595 -0.0100859 -0.0440919 -0.0197559 -0.0155801 -0.00764248 -0.0298349 -0.0310303 -0.0265657 -0.0134552 -0.0357309 -0.0308558 -0.0226509 -0.0246531 -0.0225929 -0.0343187 -0.0199525 -0.0140158 1 -0.028219 -0.0312473 -0.01969 -0.0124664 -0.0346763 -0.0133596 -0.0193571 -0.0120487 -0.018253 -0.0123633 -0.00887564 -0.0449662 -0.0367978 -0.0220648 -0.0191547 -0.0448374 -0.0121544 -0.0288286 -0.0165434 -0.0325643 -0.0395381 -0.011394 -0.0301934 -0.0130685 -0.0194909 -0.02906 -0.020784 -0.00503651
US_MN -0.0227757 -0.0238797 -0.014435 0.00450383 -0.0361031 0.054349 0.0424692 0.0478034 -0.0111595 0.0131907 0.00708404 0.0279378 -0.00581281 0.0201908 -0.00800157 -0.00785705 -3.33262e-05 0.00890269 -0.0153205 -0.0411466 -0.00301639 -0.00302121 -0.022979 -0.0195386 -0.00956861 -0.0418304 -0.0187426 -0.014781 -0.00725049 -0.0283046 -0.0294388 -0.0252031 -0.0127651 -0.0338982 -0.0292732 -0.0214891 -0.0233886 -0.0214341 -0.0325584 -0.0189291 -0.013297 -0.028219 1 -0.0296446 -0.0186801 -0.011827 -0.0328977 -0.0126743 -0.0183642 -0.0114307 -0.0173168 -0.0117291 -0.0084204 -0.0426598 -0.0349104 -0.0209331 -0.0181722 -0.0425377 -0.011531 -0.02735 -0.0156949 -0.0308941 -0.0375102 -0.0108096 -0.0286448 -0.0123982 -0.0184912 -0.0275695 -0.019718 -0.00477818
US_MO -0.0348196 0.0259896 -0.0094142 -0.0301886 -0.0148414 0.0419179 0.0273302 0.0243775 0.0286039 0.0388539 -0.0403201 0.00719618 0.0115354 -0.0306482 -0.00886024 0.00143837 -0.015339 0.0077049 0.0144337 0.000865173 0.0999175 -0.00334543 -0.025445 -0.0216354 -0.0105954 -0.0463193 -0.0207539 -0.0163672 -0.00802856 -0.0313421 -0.0325979 -0.0279077 -0.014135 -0.0375359 -0.0324146 -0.0237951 -0.0258985 -0.0237342 -0.0360524 -0.0209605 -0.0147239 -0.0312473 -0.0296446 1 -0.0206847 -0.0130962 -0.036428 -0.0140345 -0.0203349 -0.0126573 -0.0191751 -0.0129878 -0.00932402 -0.0472377 -0.0386567 -0.0231795 -0.0201223 -0.0471025 -0.0127684 -0.030285 -0.0173792 -0.0342094 -0.0415355 -0.0119696 -0.0317187 -0.0137287 -0.0204755 -0.0305281 -0.021834 -0.00529094
US_MS -0.00901266 -0.0284623 -0.0344027 -0.0508904 -0.0255839 -0.0557801 -0.0601957 -0.0276786 -0.0339748 -0.0177831 0.0392393 -0.0112065 0.00573197 -0.0156822 -0.00558313 -0.0054823 0.000891555 0.00124067 -0.01069 0.0314267 -0.0142486 -0.00210807 -0.0160337 -0.0136332 -0.00667654 -0.0291874 -0.0130777 -0.0103135 -0.00505907 -0.0197497 -0.020541 -0.0175856 -0.00890691 -0.0236527 -0.0204255 -0.0149941 -0.0163195 -0.0149557 -0.0227178 -0.0132079 -0.00927802 -0.01969 -0.0186801 -0.0206847 1 -0.00825234 -0.0229545 -0.00884359 -0.0128137 -0.00797581 -0.0120829 -0.00818406 -0.00587538 -0.0297661 -0.0243589 -0.0146062 -0.0126797 -0.0296809 -0.00804582 -0.0190836 -0.0109512 -0.0215565 -0.0261729 -0.00754243 -0.019987 -0.00865089 -0.0129023 -0.0192368 -0.0137583 -0.003334
US_MT -0.00541184 -0.0181119 -0.0163734 -0.0130648 -0.0345764 -0.00335501 -0.010757 0.0579306 -0.0305239 -0.0112591 0.0338148 -0.00709525 -0.00860507 0.0330073 -0.00353488 -0.00347103 -0.0437441 -0.000266005 -0.00676818 0.0342019 -0.0090213 -0.00133469 -0.0101515 -0.00863163 -0.00422715 -0.0184795 -0.00827997 -0.00652985 -0.00320307 -0.0125042 -0.0130052 -0.011134 -0.00563928 -0.0149753 -0.0129321 -0.00949329 -0.0103324 -0.00946899 -0.0143834 -0.00836237 -0.00587423 -0.0124664 -0.011827 -0.0130962 -0.00825234 1 -0.0145333 -0.00559918 -0.00811281 -0.00504976 -0.0076501 -0.00518161 -0.0037199 -0.0188459 -0.0154225 -0.00924768 -0.00802799 -0.018792 -0.00509409 -0.0120825 -0.00693357 -0.0136482 -0.016571 -0.00477537 -0.0126545 -0.00547718 -0.00816889 -0.0121795 -0.00871088 -0.00211087
US_NC -0.016836 -0.0143535 -0.0267402 -0.0331196 0.000101471 -0.1082 -0.0827145 -0.0605069 -0.00819061 -0.0313181 0.0177041 0.00747519 -0.00515431 0.0392297 -0.00983254 -0.00965495 -0.0221354 0.0120448 0.00492513 -0.0306838 -0.0215055 -0.00371255 -0.0282372 -0.0240096 -0.0117582 -0.0514023 -0.0230314 -0.0181633 -0.00890959 -0.0347815 -0.0361751 -0.0309702 -0.0156861 -0.041655 -0.0359716 -0.0264063 -0.0287405 -0.0263388 -0.0400086 -0.0232606 -0.0163396 -0.0346763 -0.0328977 -0.036428 -0.0229545 -0.0145333 1 -0.0155746 -0.0225664 -0.0140463 -0.0212793 -0.0144131 -0.0103472 -0.0524215 -0.0428988 -0.0257232 -0.0223305 -0.0522714 -0.0141696 -0.0336084 -0.0192863 -0.0379634 -0.0460935 -0.0132831 -0.0351995 -0.0152352 -0.0227224 -0.0338781 -0.02423 -0.00587155
US_ND -0.0158209 -0.0124463 -0.0249364 -0.0314623 -0.0263405 -0.0154574 -0.0213143 0.0404728 -0.0373167 -0.0120658 0.0408218 -0.0076036 -0.00922159 0.0120889 -0.00378814 -0.00371972 -0.0144011 -0.0103073 -0.0072531 0.0211205 0.0263512 -0.00143032 -0.0108788 -0.00925006 -0.00453001 -0.0198035 -0.00887321 -0.00699769 -0.00343256 -0.0134001 -0.013937 -0.0119318 -0.00604331 -0.0160482 -0.0138586 -0.0101734 -0.0110727 -0.0101474 -0.015414 -0.0089615 -0.0062951 -0.0133596 -0.0126743 -0.0140345 -0.00884359 -0.00559918 -0.0155746 1 -0.00869406 -0.00541156 -0.0081982 -0.00555286 -0.00398642 -0.0201962 -0.0165274 -0.00991024 -0.00860316 -0.0201384 -0.00545906 -0.0129481 -0.00743034 -0.014626 -0.0177582 -0.00511751 -0.0135611 -0.0058696 -0.00875416 -0.0130521 -0.00933498 -0.00226211
US_NE -0.0360222 -0.0199634 -0.0202269 -0.0222646 -0.0118621 -0.0024092 -0.0148914 0.0120724 -0.000162875 0.0684533 -0.0207159 -0.0110171 -0.0133614 0.0182193 -0.00548874 -0.0053896 -0.033213 0.00109375 0.163414 -0.0145908 -0.00775221 -0.00207242 -0.0157626 -0.0134027 -0.00656366 -0.0286939 -0.0128566 -0.0101391 -0.00497353 -0.0194158 -0.0201937 -0.0172883 -0.00875632 -0.0232527 -0.0200802 -0.0147406 -0.0160436 -0.0147029 -0.0223337 -0.0129846 -0.00912115 -0.0193571 -0.0183642 -0.0203349 -0.0128137 -0.00811281 -0.0225664 -0.00869406 1 -0.00784096 -0.0118786 -0.00804569 -0.00577604 -0.0292628 -0.023947 -0.0143592 -0.0124654 -0.029179 -0.00790978 -0.0187609 -0.010766 -0.021192 -0.0257304 -0.0074149 -0.0196491 -0.00850463 -0.0126841 -0.0189115 -0.0135257 -0.00327763
US_NH -0.0154257 0.00158923 0.0870605 0.0408025 0.00503929 -0.0168065 -0.00799273 0.00889684 -0.00500688 -0.0108818 0.00830233 -0.0068575 -0.00831672 -0.0327909 -0.00341643 -0.00335472 0.0102941 0.0161863 -0.00654139 0.0157839 -0.00871901 -0.00128997 -0.00981135 -0.0083424 -0.00408551 -0.0178603 -0.00800253 -0.00631104 -0.00309574 -0.0120852 -0.0125695 -0.010761 -0.00545031 -0.0144735 -0.0124988 -0.00917518 -0.00998621 -0.0091517 -0.0139015 -0.00808216 -0.0056774 -0.0120487 -0.0114307 -0.0126573 -0.00797581 -0.00504976 -0.0140463 -0.00541156 -0.00784096 1 -0.00739375 -0.00500799 -0.00359526 -0.0182144 -0.0149057 -0.00893781 -0.00775898 -0.0181623 -0.00492339 -0.0116776 -0.00670124 -0.0131908 -0.0160157 -0.00461536 -0.0122305 -0.00529365 -0.00789517 -0.0117713 -0.00841899 -0.00204014
US_NJ -0.007575 0.0188601 0.0471192 0.0617851 0.0348472 -0.00576501 -0.00383764 -0.0116048 0.0228353 -0.00577854 -0.0209658 -0.0103887 -0.0125993 -0.0402114 -0.00517569 -0.00508221 0.0701205 -0.0239126 -0.00990982 0.00837955 -0.0132088 -0.00195422 -0.0148636 -0.0126382 -0.0061893 -0.0270573 -0.0121234 -0.00956086 -0.00468986 -0.0183084 -0.019042 -0.0163022 -0.0082569 -0.0219265 -0.0189349 -0.0138999 -0.0151285 -0.0138643 -0.0210599 -0.012244 -0.00860092 -0.018253 -0.0173168 -0.0191751 -0.0120829 -0.0076501 -0.0212793 -0.0081982 -0.0118786 -0.00739375 1 -0.00758681 -0.0054466 -0.0275938 -0.0225812 -0.0135403 -0.0117544 -0.0275148 -0.00745865 -0.0176909 -0.010152 -0.0199833 -0.0242628 -0.006992 -0.0185284 -0.00801957 -0.0119607 -0.0178329 -0.0127543 -0.00309069
US_NM 0.0102785 -0.00589527 -0.0218641 -0.039038 -0.0179547 -0.0174409 -0.0329125 -0.0266192 -0.00805188 0.0909665 -0.0197303 -0.00703655 0.0525354 -0.0338019 -0.00350563 -0.00344231 -0.00831791 0.0105165 -0.00671219 0.0307409 -0.00894666 -0.00132365 -0.0100675 -0.00856022 -0.00419218 -0.0183266 -0.00821147 -0.00647583 -0.00317657 -0.0124008 -0.0128976 -0.0110419 -0.00559262 -0.0148514 -0.0128251 -0.00941475 -0.010247 -0.00939066 -0.0142644 -0.00829319 -0.00582564 -0.0123633 -0.0117291 -0.0129878 -0.00818406 -0.00518161 -0.0144131 -0.00555286 -0.00804569 -0.00500799 -0.00758681 1 -0.00368913 -0.01869 -0.0152949 -0.00917117 -0.00796157 -0.0186365 -0.00505194 -0.0119825 -0.00687621 -0.0135352 -0.0164339 -0.00473586 -0.0125498 -0.00543187 -0.00810131 -0.0120787 -0.00863881 -0.00209341
US_NV 0.0480413 0.00091142 0.0092364 -0.00909094 -0.00492895 -0.00877426 -0.0122033 0.0321896 -0.0254609 -0.0080161 0.0277865 -0.00505157 -0.00612651 -0.0263795 -0.00251671 -0.00247125 -0.0124052 -0.00254565 -0.00481871 0.0695422 -0.00642285 -0.000950253 -0.00722752 -0.00614542 -0.00300958 -0.0131568 -0.00589505 -0.00464902 -0.00228047 -0.00890257 -0.00925928 -0.00792705 -0.00401497 -0.0106619 -0.0092072 -0.00675889 -0.00735633 -0.00674159 -0.0102405 -0.00595371 -0.00418225 -0.00887564 -0.0084204 -0.00932402 -0.00587538 -0.0037199 -0.0103472 -0.00398642 -0.00577604 -0.00359526 -0.0054466 -0.00368913 1 -0.0134176 -0.0109803 -0.00658402 -0.00571564 -0.0133792 -0.00362681 -0.0086023 -0.00493646 -0.00971701 -0.0117979 -0.0033999 -0.00900955 -0.00389956 -0.00581596 -0.00867135 -0.00620184 -0.00150287
US_NY -0.0187019 0.0791389 0.0567416 0.0336236 0.0996807 0.104802 0.137936 -0.107467 0.0112598 -0.0406115 0.00117937 0.0138975 -0.0191447 -0.0218687 0.0230207 -0.00523481 0.017143 -0.00287028 -0.00185129 0.0143865 -0.00981854 -0.00481421 -0.0366164 -0.0311342 -0.0152473 -0.0666554 -0.0298658 -0.0235531 -0.0115534 -0.0451026 -0.0469097 -0.0401604 -0.0203408 -0.0540158 -0.0466459 -0.0342422 -0.037269 -0.0341545 -0.0518809 -0.030163 -0.0211883 -0.0449662 -0.0426598 -0.0472377 -0.0297661 -0.0188459 -0.0524215 -0.0201962 -0.0292628 -0.0182144 -0.0275938 -0.01869 -0.0134176 1 -0.0556286 -0.0333563 -0.0289568 -0.0677825 -0.0183743 -0.0435813 -0.0250093 -0.0492287 -0.0597713 -0.0172247 -0.0456446 -0.0197561 -0.0294651 -0.0439312 -0.03142 -0.00761388
US_OH -0.0184526 -0.0143462 -0.030313 -0.00345602 -0.0223197 0.014306 0.0161522 0.0312897 0.0459644 0.0903185 -0.0732957 0.0563606 0.049298 -0.00753721 -0.0104341 -0.0102456 -0.0339165 0.00724361 0.0160086 0.01839 -0.0198334 -0.00393968 -0.0299648 -0.0254785 -0.0124775 -0.0545471 -0.0244405 -0.0192745 -0.00945469 -0.0369094 -0.0383883 -0.032865 -0.0166458 -0.0442035 -0.0381724 -0.0280219 -0.0304988 -0.0279502 -0.0424564 -0.0246837 -0.0173393 -0.0367978 -0.0349104 -0.0386567 -0.0243589 -0.0154225 -0.0428988 -0.0165274 -0.023947 -0.0149057 -0.0225812 -0.0152949 -0.0109803 -0.0556286 1 -0.0272969 -0.0236967 -0.0554694 -0.0150365 -0.0356645 -0.0204662 -0.040286 -0.0489135 -0.0140957 -0.037353 -0.0161673 -0.0241126 -0.0359508 -0.0257124 -0.00623078
US_OK -0.012005 -0.0317274 -0.0358096 -0.058984 -0.048913 -0.0323995 -0.0427766 0.0422753 -0.025896 -0.019928 0.0318526 -0.0125582 -0.0152305 -0.00888255 -0.00625654 -0.00614353 0.0188516 0.00334945 -0.0119793 -0.0104534 0.0115741 -0.00236233 -0.0179676 -0.0152775 -0.00748183 -0.0327077 -0.0146551 -0.0115575 -0.00566926 -0.0221318 -0.0230186 -0.0197066 -0.00998121 -0.0265055 -0.0228891 -0.0168026 -0.0182878 -0.0167596 -0.0254579 -0.0148009 -0.0103971 -0.0220648 -0.0209331 -0.0231795 -0.0146062 -0.00924768 -0.0257232 -0.00991024 -0.0143592 -0.00893781 -0.0135403 -0.00917117 -0.00658402 -0.0333563 -0.0272969 1 -0.0142091 -0.0332608 -0.00901625 -0.0213853 -0.012272 -0.0241565 -0.0293297 -0.00845215 -0.0223977 -0.00969431 -0.0144585 -0.021557 -0.0154178 -0.00373612
US_OR -0.015048 -0.00258593 0.00346497 0.0468469 0.00795205 0.0336029 0.0152663 0.0198869 0.00972186 -0.0172997 -0.00439996 -0.0109019 -0.0132217 0.0119483 -0.00543135 -0.00533325 -0.00935111 -0.00347959 -0.0103993 0.0101341 0.0177385 -0.00205076 -0.0155978 -0.0132625 -0.00649503 -0.0283939 -0.0127222 -0.0100331 -0.00492153 -0.0192128 -0.0199826 -0.0171075 -0.00866477 -0.0230096 -0.0198702 -0.0145865 -0.0158758 -0.0145491 -0.0221002 -0.0128488 -0.00902578 -0.0191547 -0.0181722 -0.0201223 -0.0126797 -0.00802799 -0.0223305 -0.00860316 -0.0124654 -0.00775898 -0.0117544 -0.00796157 -0.00571564 -0.0289568 -0.0236967 -0.0142091 1 -0.028874 -0.00782708 -0.0185648 -0.0106535 -0.0209704 -0.0254614 -0.00733738 -0.0194437 -0.00841571 -0.0125515 -0.0187138 -0.0133843 -0.00324336
US_PA -0.0454119 -0.00388298 0.0322034 0.0444927 0.00815137 0.00214309 0.010394 0.0155512 0.021692 0.0308315 -0.0309939 -0.000326289 0.0137634 0.0313743 -0.0127138 -0.0124841 0.0117135 -0.0190958 -0.00172487 -0.0274099 -0.0068212 -0.00480043 -0.0365116 -0.0310451 -0.0152036 -0.0664646 -0.0297803 -0.0234857 -0.0115204 -0.0449735 -0.0467755 -0.0400454 -0.0202826 -0.0538612 -0.0465124 -0.0341442 -0.0371623 -0.0340568 -0.0517324 -0.0300766 -0.0211276 -0.0448374 -0.0425377 -0.0471025 -0.0296809 -0.018792 -0.0522714 -0.0201384 -0.029179 -0.0181623 -0.0275148 -0.0186365 -0.0133792 -0.0677825 -0.0554694 -0.0332608 -0.028874 1 -0.0183217 -0.0434566 -0.0249377 -0.0490878 -0.0596002 -0.0171754 -0.0455139 -0.0196996 -0.0293808 -0.0438054 -0.0313301 -0.00759209
US_RI 0.00315613 0.0144263 0.0249062 0.00958416 0.0179039 0.0433207 0.0498508 -0.0502495 0.0184451 -0.0109774 -0.0150106 0.0555715 -0.00838972 -0.0361245 -0.00344641 -0.00338417 0.0039838 0.00507875 -0.00659881 0.0278446 -0.00879553 -0.00130129 -0.00989746 -0.00841562 -0.00412136 -0.0180171 -0.00807276 -0.00636644 -0.00312291 -0.0121913 -0.0126798 -0.0108554 -0.00549815 -0.0146005 -0.0126085 -0.00925571 -0.0100739 -0.00923203 -0.0140235 -0.0081531 -0.00572723 -0.0121544 -0.011531 -0.0127684 -0.00804582 -0.00509409 -0.0141696 -0.00545906 -0.00790978 -0.00492339 -0.00745865 -0.00505194 -0.00362681 -0.0183743 -0.0150365 -0.00901625 -0.00782708 -0.0183217 1 -0.0117801 -0.00676006 -0.0133066 -0.0161563 -0.00465587 -0.0123378 -0.00534011 -0.00796446 -0.0118747 -0.00849289 -0.00205804
US_SC -0.0460578 -0.0454662 -0.0480746 -0.0317049 -0.0464502 -0.0862662 -0.0711365 0.00622307 -0.00112487 -0.0260368 0.00906071 -0.0164078 0.122948 0.0253306 -0.00817442 -0.00802678 -0.0266315 0.00124421 -0.0156515 -0.0334516 0.000457782 -0.00308648 -0.0234754 -0.0199607 -0.00977531 -0.042734 -0.0191475 -0.0151003 -0.00740712 -0.0289161 -0.0300747 -0.0257475 -0.0130409 -0.0346305 -0.0299055 -0.0219533 -0.0238938 -0.0218971 -0.0332618 -0.019338 -0.0135842 -0.0288286 -0.02735 -0.030285 -0.0190836 -0.0120825 -0.0336084 -0.0129481 -0.0187609 -0.0116776 -0.0176909 -0.0119825 -0.0086023 -0.0435813 -0.0356645 -0.0213853 -0.0185648 -0.0434566 -0.0117801 1 -0.0160339 -0.0315614 -0.0383205 -0.0110431 -0.0292636 -0.012666 -0.0188906 -0.0281651 -0.020144 -0.0048814
US_SD -0.0352078 -0.0269191 -0.0264378 -0.0269502 -0.0395314 -0.0109916 -0.0188537 0.0808538 -0.0451261 -0.0149413 0.0494716 -0.00941568 -0.0114193 0.0369595 -0.00469092 0.0889527 -0.0337638 -0.00468526 -0.00898165 0.0096183 -0.0119716 -0.00177119 -0.0134715 -0.0114545 -0.0056096 -0.0245231 -0.0109879 -0.00866537 -0.0042506 -0.0165936 -0.0172585 -0.0147753 -0.00748355 -0.0198728 -0.0171614 -0.012598 -0.0137116 -0.0125657 -0.0190874 -0.0110972 -0.00779535 -0.0165434 -0.0156949 -0.0173792 -0.0109512 -0.00693357 -0.0192863 -0.00743034 -0.010766 -0.00670124 -0.010152 -0.00687621 -0.00493646 -0.0250093 -0.0204662 -0.012272 -0.0106535 -0.0249377 -0.00676006 -0.0160339 1 -0.0181117 -0.0219903 -0.00633711 -0.016793 -0.00726844 -0.0108405 -0.0161626 -0.0115597 -0.00280121
US_TN -0.029037 -0.0197127 -0.0362217 -0.046965 -0.0522864 -0.0419396 -0.0461261 0.026157 -0.0154323 -0.0294107 0.0243301 0.039152 0.0770604 -0.0334617 -0.00923369 -0.00906691 -0.00558988 0.0240389 0.0226015 -0.0328346 -0.000746618 -0.00348643 -0.0265174 -0.0225473 -0.011042 -0.0482716 -0.0216287 -0.0170571 -0.00836695 -0.0326631 -0.0339719 -0.029084 -0.0147307 -0.039118 -0.0337808 -0.024798 -0.02699 -0.0247346 -0.0375719 -0.0218439 -0.0153445 -0.0325643 -0.0308941 -0.0342094 -0.0215565 -0.0136482 -0.0379634 -0.014626 -0.021192 -0.0131908 -0.0199833 -0.0135352 -0.00971701 -0.0492287 -0.040286 -0.0241565 -0.0209704 -0.0490878 -0.0133066 -0.0315614 -0.0181117 1 -0.0432861 -0.0124741 -0.0330556 -0.0143073 -0.0213385 -0.0318148 -0.0227543 -0.00551395
US_TX 0.0785166 -0.0278553 -0.0194134 -0.0427503 -0.0350672 -0.0513269 -0.0525898 -0.0039304 -0.0372856 -0.0357091 0.048002 -0.0225031 -0.0272916 -0.0529416 -0.0112111 -0.0110086 0.0643979 -0.00564278 -0.0214658 0.0173199 -0.00632778 -0.00423307 -0.0321963 -0.0273759 -0.0134067 -0.0586092 -0.0262606 -0.0207099 -0.0101588 -0.0396581 -0.0412471 -0.0353124 -0.0178854 -0.0474953 -0.0410151 -0.0301087 -0.0327701 -0.0300316 -0.0456181 -0.0265219 -0.0186306 -0.0395381 -0.0375102 -0.0415355 -0.0261729 -0.016571 -0.0460935 -0.0177582 -0.0257304 -0.0160157 -0.0242628 -0.0164339 -0.0117979 -0.0597713 -0.0489135 -0.0293297 -0.0254614 -0.0596002 -0.0161563 -0.0383205 -0.0219903 -0.0432861 1 -0.0151454 -0.0401346 -0.0173713 -0.0259082 -0.0386281 -0.0276272 -0.00669478
US_UT 0.0939969 -0.00377335 -0.00471176 -0.0353013 -0.0180433 0.0387288 0.0252195 0.0489399 -0.0219074 -0.0102906 0.0249433 -0.00648488 -0.00786481 -0.0338643 -0.00323079 -0.00317243 -0.007812 0.00181438 0.00776989 0.0638839 -0.00824524 -0.00121987 -0.00927822 -0.0078891 -0.00386351 -0.0168898 -0.00756769 -0.00596812 -0.00292752 -0.0114285 -0.0118865 -0.0101762 -0.00515416 -0.0136871 -0.0118196 -0.00867662 -0.00944358 -0.00865442 -0.0131461 -0.00764299 -0.0053689 -0.011394 -0.0108096 -0.0119696 -0.00754243 -0.00477537 -0.0132831 -0.00511751 -0.0074149 -0.00461536 -0.006992 -0.00473586 -0.0033999 -0.0172247 -0.0140957 -0.00845215 -0.00733738 -0.0171754 -0.00465587 -0.0110431 -0.00633711 -0.0124741 -0.0151454 1 -0.0115659 -0.00500601 -0.00746616 -0.0111317 -0.00796152 -0.00192928
US_VA -0.00682535 -0.00636504 0.00347537 0.00687161 -0.00990572 -0.0100228 -0.0104804 -0.00421751 0.0432237 -0.00417943 -0.0417464 -0.0171846 -0.0165679 0.0461932 -0.00856142 -0.00840679 -0.0142174 -0.0126294 0.010629 -0.000459358 -0.0218495 -0.0032326 -0.0245868 -0.0209057 -0.0102381 -0.0447571 -0.020054 -0.0158152 -0.00775779 -0.030285 -0.0314985 -0.0269665 -0.0136582 -0.03627 -0.0313213 -0.0229926 -0.025025 -0.0229338 -0.0348365 -0.0202535 -0.0142273 -0.0301934 -0.0286448 -0.0317187 -0.019987 -0.0126545 -0.0351995 -0.0135611 -0.0196491 -0.0122305 -0.0185284 -0.0125498 -0.00900955 -0.0456446 -0.037353 -0.0223977 -0.0194437 -0.0455139 -0.0123378 -0.0292636 -0.016793 -0.0330556 -0.0401346 -0.0115659 1 -0.0132657 -0.0197849 -0.0294985 -0.0210976 -0.0051125
US_VT -0.0251213 0.00103296 0.0216799 0.0238824 0.0270333 0.017815 0.0207887 0.00730904 -0.00323173 -0.0118029 0.00681642 -0.00743792 -0.00902066 0.0308106 -0.0037056 -0.00363867 -0.0254309 0.0107666 -0.00709506 -0.00856702 -0.00945699 -0.00139915 -0.0106418 -0.00904851 -0.00443131 -0.019372 -0.00867987 -0.00684522 -0.00335777 -0.0131081 -0.0136333 -0.0116718 -0.00591163 -0.0156986 -0.0135567 -0.00995178 -0.0108315 -0.00992631 -0.0150781 -0.00876624 -0.00615794 -0.0130685 -0.0123982 -0.0137287 -0.00865089 -0.00547718 -0.0152352 -0.0058696 -0.00850463 -0.00529365 -0.00801957 -0.00543187 -0.00389956 -0.0197561 -0.0161673 -0.00969431 -0.00841571 -0.0196996 -0.00534011 -0.012666 -0.00726844 -0.0143073 -0.0173713 -0.00500601 -0.0132657 1 -0.00856342 -0.0127677 -0.00913158 -0.00221282
US_WA 0.0351272 0.0023682 -0.00571933 -0.00874179 -0.00912228 0.0632834 0.0445013 0.0547472 0.0246443 -0.0125821 -0.0206913 -0.0110932 -0.0134538 -0.0221885 -0.00552668 -0.00542686 0.0367645 -0.00325446 -0.0105819 0.000836497 -0.0141045 -0.00208675 -0.0158716 -0.0134953 -0.00660903 -0.0288922 -0.0129455 -0.0102092 -0.00500791 -0.01955 -0.0203333 -0.0174078 -0.00881685 -0.0234135 -0.020219 -0.0148425 -0.0161545 -0.0148045 -0.0224881 -0.0130743 -0.0091842 -0.0194909 -0.0184912 -0.0204755 -0.0129023 -0.00816889 -0.0227224 -0.00875416 -0.0126841 -0.00789517 -0.0119607 -0.00810131 -0.00581596 -0.0294651 -0.0241126 -0.0144585 -0.0125515 -0.0293808 -0.00796446 -0.0188906 -0.0108405 -0.0213385 -0.0259082 -0.00746616 -0.0197849 -0.00856342 1 -0.0190422 -0.0136192 -0.00330029
US_WI -0.00186858 -0.0384491 -0.0328574 -0.0345826 -0.0398026 -0.00238835 0.00622965 0.0838889 -0.0212315 -0.0262458 0.0291368 -0.0165395 -0.020059 -0.0210325 -0.00824004 -0.00809121 0.0459613 0.00563653 0.0122367 -0.0286435 -0.0210293 -0.00311125 -0.0236639 -0.0201209 -0.00985378 -0.043077 -0.0193012 -0.0152215 -0.00746658 -0.0291482 -0.0303161 -0.0259542 -0.0131455 -0.0349085 -0.0301456 -0.0221295 -0.0240856 -0.0220729 -0.0335288 -0.0194933 -0.0136932 -0.02906 -0.0275695 -0.0305281 -0.0192368 -0.0121795 -0.0338781 -0.0130521 -0.0189115 -0.0117713 -0.0178329 -0.0120787 -0.00867135 -0.0439312 -0.0359508 -0.021557 -0.0187138 -0.0438054 -0.0118747 -0.0281651 -0.0161626 -0.0318148 -0.0386281 -0.0111317 -0.0294985 -0.0127677 -0.0190422 1 -0.0203057 -0.00492058
US_WV -0.0313039 -0.0380825 -0.0441493 -0.0373967 -0.027115 -0.0558444 -0.0641102 0.0391061 -0.00399634 9.64608e-05 0.00394819 -0.0118292 -0.0143464 0.0619245 -0.00589336 -0.00578692 -0.0421613 0.00644108 -0.0112839 -0.0236046 0.0141451 -0.0022252 -0.0169246 -0.0143907 -0.00704753 -0.0308092 -0.0138044 -0.0108866 -0.00534017 -0.0208471 -0.0216824 -0.0185627 -0.00940183 -0.0249669 -0.0215604 -0.0158273 -0.0172263 -0.0157868 -0.0239801 -0.0139418 -0.00979355 -0.020784 -0.019718 -0.021834 -0.0137583 -0.00871088 -0.02423 -0.00933498 -0.0135257 -0.00841899 -0.0127543 -0.00863881 -0.00620184 -0.03142 -0.0257124 -0.0154178 -0.0133843 -0.0313301 -0.00849289 -0.020144 -0.0115597 -0.0227543 -0.0276272 -0.00796152 -0.0210976 -0.00913158 -0.0136192 -0.0203057 1 -0.00351925
US_WY 0.0156499 0.00472888 0.00254724 -0.0215221 -0.00477002 0.0100941 0.00559766 0.0464942 0.0153538 -0.00454876 -0.0138945 -0.00286653 -0.0034765 -0.0149691 -0.00142811 -0.00140232 -0.0176729 0.00729417 -0.00273439 0.041088 -0.00364466 -0.000539224 -0.00410128 -0.00348724 -0.0017078 -0.00746585 -0.00334517 -0.0026381 -0.00129406 -0.00505179 -0.0052542 -0.00449823 -0.0022783 -0.00605013 -0.00522465 -0.00383535 -0.00417437 -0.00382554 -0.005811 -0.00337845 -0.00237323 -0.00503651 -0.00477818 -0.00529094 -0.003334 -0.00211087 -0.00587155 -0.00226211 -0.00327763 -0.00204014 -0.00309069 -0.00209341 -0.00150287 -0.00761388 -0.00623078 -0.00373612 -0.00324336 -0.00759209 -0.00205804 -0.0048814 -0.00280121 -0.00551395 -0.00669478 -0.00192928 -0.0051125 -0.00221282 -0.00330029 -0.00492058 -0.00351925 1
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(15,15))
matrix = corrMatrix.corr().round(2)
sns.heatmap(matrix, annot=False, vmax=1, vmin=-1, center=0, cmap='vlag')
plt.savefig('CorrelationMatrix',dpi=300)
scores = cross_val_score(LinearRegression(), X_train, y_train, cv = 10)
scores
array([0.73437544, 0.75058678, 0.73165224, 0.75366645, 0.60865991,
       0.76854665, 0.70585779, 0.44220904, 0.71482546, 0.72510093])

Check for the most important parameters using random forest

X_train = train.loc[:, ~dropped.columns.isin(['unitid', 'exp_acad_inst_student_total_per', 'Grad_Rates_Two_Classes', 'Grad_Rates_Three_Classes', 'Grad_Rates_Quartiles', 'completion_rate_150pct', 'year', 'cc_basic_2010'])]

X_test = test.loc[:, ~dropped.columns.isin(['unitid', 'exp_acad_inst_student_total_per', 'Grad_Rates_Two_Classes', 'Grad_Rates_Three_Classes', 'Grad_Rates_Quartiles', 'completion_rate_150pct', 'year', 'cc_basic_2010'])]

y_train = train.Grad_Rates_Two_Classes

y_test = test.Grad_Rates_Two_Classes

X_train.head(1)
est_fte exp_instruc_total_per exp_acad_supp_total_per exp_student_serv_total_per exp_inst_supp_total_per act_composite_75_pctl act_composite_25_pctl acceptance_rate OrgStructure_Independent OrgStructure_MultiCampus OrgStructure_MultiOrg Carnegie_Art_Music_Design Carnegie_Associate Carnegie_Bachelors Carnegie_Business_Management Carnegie_Engineering Carnegie_Graduate_Professional Carnegie_Missing Carnegie_Other_Tech_Health Carnegie_Research_Universities Carnegie_Theological Carnegie_Tribal_Colleges US_AL US_AR US_AZ US_CA US_CO US_CT US_DE US_FL US_GA US_IA US_ID US_IL US_IN US_KS US_KY US_LA US_MA US_MD US_ME US_MI US_MN US_MO US_MS US_MT US_NC US_ND US_NE US_NH US_NJ US_NM US_NV US_NY US_OH US_OK US_OR US_PA US_RI US_SC US_SD US_TN US_TX US_UT US_VA US_VT US_WA US_WI US_WV US_WY
11618 0.804298 0.092503 0.049112 0.06432 0.055147 0.236842 0.224299 0.54551 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
##anomoly detection and outlier removal
from sklearn.covariance import EllipticEnvelope
envelope1 = EllipticEnvelope(support_fraction=1, contamination=0.03).fit(X_train)

outliers1 = envelope1.predict(X_train)==-1  


X_clean1 = X_train[~outliers1]  
y_clean1 = y_train[~outliers1]

print(f"Num of outliers = {np.sum(outliers1)}")
/opt/conda/lib/python3.7/site-packages/sklearn/covariance/_robust_covariance.py:739: UserWarning: The covariance matrix associated to your dataset is not full rank
  "The covariance matrix associated to your dataset is not full rank"
Num of outliers = 353
rf = RandomForestClassifier(n_estimators=100)
rf.fit(X_clean1, y_clean1)

##grab importance values
importances = rf.feature_importances_

##sort the indices
sorted_indices = np.argsort(importances)[::-1]
import matplotlib.pyplot as plt

#ax.set_title('Feature Importance All Variables')
plt.rcdefaults() 
plt.figure(figsize=(15,15))
plt.barh(range(X_train.shape[1]), importances[sorted_indices], align='center', color = "darkgreen")
plt.yticks(range(X_train.shape[1]), X_train.columns[sorted_indices], fontsize=16)
plt.xticks(fontsize=20)
plt.savefig('FeatureImportanceALLVariables.png',dpi=300)

Trimmed Feature Importance Chart

X_clean2  = X_clean1.iloc[: , :21]

rf = RandomForestClassifier(n_estimators=100)
rf.fit(X_clean2, y_clean1)

##grab importance values
importances = rf.feature_importances_

##sort the indices
sorted_indices = np.argsort(importances)[::-1]
import matplotlib.pyplot as plt

#ax.set_title('Feature Importance All Variables')
plt.rcdefaults() 
plt.figure(figsize=(10,10))
plt.barh(range(X_clean2.shape[1]), importances[sorted_indices], align='center', color = "darkgreen")
plt.yticks(range(X_clean2.shape[1]), X_clean2.columns[sorted_indices], fontsize=16)
plt.xticks(fontsize=20)
plt.savefig('FeatureImportance21Variables.png',dpi=300)
 
X_train = train[['est_fte', 'exp_instruc_total_per', 'exp_acad_supp_total_per', 'exp_student_serv_total_per', 'exp_inst_supp_total_per', 'act_composite_75_pctl', 'act_composite_25_pctl', 'acceptance_rate']]

X_test = test[['est_fte', 'exp_instruc_total_per', 'exp_acad_supp_total_per', 'exp_student_serv_total_per', 'exp_inst_supp_total_per',  'act_composite_75_pctl', 'act_composite_25_pctl', 'acceptance_rate']]

y_train = train.completion_rate_150pct

y_test = test.completion_rate_150pct

X_train.head(1)
est_fte exp_instruc_total_per exp_acad_supp_total_per exp_student_serv_total_per exp_inst_supp_total_per act_composite_75_pctl act_composite_25_pctl acceptance_rate
11618 0.804298 0.092503 0.049112 0.06432 0.055147 0.236842 0.224299 0.54551

Trimmed Correlation Matrix

##Export a matrix for presentation
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
corrMatrix=X_train.corr()
plt.figure(figsize=(15,15))
plt.xticks(fontsize=20,)
plt.yticks(fontsize=20)
matrix = corrMatrix.corr().round(2)
sns.heatmap(matrix, annot=False, vmax=1, vmin=-1, center=0, cmap='vlag')
plt.xticks(rotation = 45)
plt.savefig('CorrelationMatrixTrimmed.png',dpi=300)

Create train and test sets for both full and trimmed datasets

X_trainFull = train.loc[:, ~dropped.columns.isin(['unitid', 'Grad_Rates_Two_Classes', 'exp_acad_inst_student_total_per','Grad_Rates_Three_Classes', 'Grad_Rates_Quartiles', 'completion_rate_150pct', 'year', 'cc_basic_2010'])]

X_testFull = test.loc[:, ~dropped.columns.isin(['unitid', 'Grad_Rates_Two_Classes', 'exp_acad_inst_student_total_per','Grad_Rates_Three_Classes', 'Grad_Rates_Quartiles', 'completion_rate_150pct', 'year', 'cc_basic_2010'])]

X_trainTrimmed = train[['est_fte', 'exp_instruc_total_per', 'exp_acad_supp_total_per', 'exp_student_serv_total_per', 'exp_inst_supp_total_per', 'act_composite_75_pctl', 'act_composite_25_pctl', 'acceptance_rate']]

X_testTrimmed = test[['est_fte', 'exp_instruc_total_per', 'exp_acad_supp_total_per', 'exp_student_serv_total_per', 'exp_inst_supp_total_per',  'act_composite_75_pctl', 'act_composite_25_pctl', 'acceptance_rate']]

y_trainRate = train.completion_rate_150pct

y_testRate = test.completion_rate_150pct

y_train2 = train.Grad_Rates_Two_Classes

y_test2 = test.Grad_Rates_Two_Classes

y_train3 = train.Grad_Rates_Three_Classes

y_test3 = test.Grad_Rates_Three_Classes

y_train4 = train.Grad_Rates_Quartiles

y_test4 = test.Grad_Rates_Quartiles
##regression sets
from sklearn.covariance import EllipticEnvelope
envelope1 = EllipticEnvelope(support_fraction=1, contamination=0.03).fit(X_trainFull)

outliers1 = envelope1.predict(X_trainFull)==-1  


X_cleanFullRate = X_trainFull[~outliers1]  
y_cleanFullRate = y_trainRate[~outliers1]

print(f"Num of outliers = {np.sum(outliers1)}")


from sklearn.covariance import EllipticEnvelope
envelope1b = EllipticEnvelope(support_fraction=1, contamination=0.03).fit(X_trainTrimmed)

outliers1b = envelope1b.predict(X_trainTrimmed)==-1  


X_cleanTrimmedRate = X_trainTrimmed[~outliers1b]  
y_cleanTrimmedRate = y_trainRate[~outliers1b]

print(f"Num of outliers = {np.sum(outliers1b)}")
/opt/conda/lib/python3.7/site-packages/sklearn/covariance/_robust_covariance.py:739: UserWarning: The covariance matrix associated to your dataset is not full rank
  "The covariance matrix associated to your dataset is not full rank"
Num of outliers = 353
Num of outliers = 353
print(X_cleanFullRate.shape)
print(X_cleanTrimmedRate.shape)
X_cleanTrimmedRate.head()
(11383, 70)
(11383, 8)
est_fte exp_instruc_total_per exp_acad_supp_total_per exp_student_serv_total_per exp_inst_supp_total_per act_composite_75_pctl act_composite_25_pctl acceptance_rate
11618 0.804298 0.092503 0.049112 0.064320 0.055147 0.236842 0.224299 0.545510
1509 0.102860 0.021577 0.008975 0.016913 0.012399 0.201754 0.177570 0.607435
11309 0.026311 0.030041 0.013175 0.075631 0.024995 0.236842 0.158879 0.698295
2576 0.022347 0.021895 0.010141 0.072629 0.032484 0.219298 0.196262 0.795709
9621 0.443424 0.049117 0.019240 0.015436 0.022622 0.228070 0.205607 0.528894
##anomoly detection and outlier removal
from sklearn.covariance import EllipticEnvelope
envelope2 = EllipticEnvelope(support_fraction=1, contamination=0.03).fit(X_trainFull)

outliers2 = envelope2.predict(X_trainFull)==-1  


X_clean2Full = X_trainFull[~outliers2]  
y_clean2Full = y_train2[~outliers2]

print(f"Num of outliers = {np.sum(outliers2)}")


envelope2b = EllipticEnvelope(support_fraction=1, contamination=0.03).fit(X_trainTrimmed)

outliers2b = envelope2b.predict(X_trainTrimmed)==-1  


X_clean2Trimmed = X_trainTrimmed[~outliers2b]  
y_clean2Trimmed = y_train2[~outliers2b]

print(f"Num of outliers = {np.sum(outliers2b)}")
/opt/conda/lib/python3.7/site-packages/sklearn/covariance/_robust_covariance.py:739: UserWarning: The covariance matrix associated to your dataset is not full rank
  "The covariance matrix associated to your dataset is not full rank"
Num of outliers = 353
Num of outliers = 353
##anomoly detection and outlier removal
envelope3 = EllipticEnvelope(support_fraction=1, contamination=0.03).fit(X_trainFull)

outliers3 = envelope3.predict(X_trainFull)==-1  


X_clean3Full = X_trainFull[~outliers3]  
y_clean3Full = y_train3[~outliers3]

print(f"Num of outliers = {np.sum(outliers3)}")


envelope3b = EllipticEnvelope(support_fraction=1, contamination=0.03).fit(X_trainTrimmed)

outliers3b = envelope3b.predict(X_trainTrimmed)==-1  


X_clean3Trimmed = X_trainTrimmed[~outliers3b]  
y_clean3Trimmed = y_train3[~outliers3b]

print(f"Num of outliers = {np.sum(outliers3b)}")
/opt/conda/lib/python3.7/site-packages/sklearn/covariance/_robust_covariance.py:739: UserWarning: The covariance matrix associated to your dataset is not full rank
  "The covariance matrix associated to your dataset is not full rank"
Num of outliers = 353
Num of outliers = 353
##anomoly detection and outlier removal
envelope4 = EllipticEnvelope(support_fraction=1, contamination=0.03).fit(X_trainFull)

outliers4 = envelope4.predict(X_trainFull)==-1  


X_clean4Full = X_trainFull[~outliers4]  
y_clean4Full = y_train4[~outliers4]

print(f"Num of outliers = {np.sum(outliers4)}")


envelope4b = EllipticEnvelope(support_fraction=1, contamination=0.03).fit(X_trainTrimmed)

outliers4b = envelope4b.predict(X_trainTrimmed)==-1  


X_clean4Trimmed = X_trainTrimmed[~outliers4b]  
y_clean4Trimmed = y_train4[~outliers4b]

print(f"Num of outliers = {np.sum(outliers4b)}")
/opt/conda/lib/python3.7/site-packages/sklearn/covariance/_robust_covariance.py:739: UserWarning: The covariance matrix associated to your dataset is not full rank
  "The covariance matrix associated to your dataset is not full rank"
Num of outliers = 353
Num of outliers = 353

Regression

Random Forest with Trimmed Set.

#set the pca_components
#pca_components = 3

pipe0 = Pipeline([
    ('LinearRegression', LinearRegression())
])

param_grid0 = {
    
    'LinearRegression__n_jobs': [1,3,5]
}
model_grid0 = GridSearchCV(pipe, param_grid0, cv = 5, n_jobs = 5)
model_grid0.fit(X_cleanTrimmedRate, y_cleanTrimmedRate)
GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('PCA', PCA()),
                                       ('LinearRegression',
                                        LinearRegression())]),
             n_jobs=5, param_grid={'LinearRegression__n_jobs': [1, 3, 5]})
model_grid0.best_params_ ##Best hyperparameters
{'LinearRegression__n_jobs': 1}
print(model_grid0.best_score_)
0.6406247457020943
y_predicted = model_grid0.predict(X_testTrimmed)
##explained variance
explained_variance_score(y_testRate, y_predicted)
0.5612968054281691
r2_score(y_testRate, y_predicted)
0.5610813750160539
plt.figure(figsize=(15,15))
plt.scatter(y_testRate, y_predicted, c='crimson')
#plt.yscale('log')
#plt.xscale('log')

p1 = max(max(y_predicted), max(y_testRate))
p2 = min(min(y_predicted), min(y_testRate))
plt.plot([p1, p2], [p1, p2], 'b-')
plt.xlabel('True Values', fontsize=20)
plt.ylabel('Predictions', fontsize=20)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.axis('equal')
plt.savefig('RegressionAcutalVsPredicted.png',dpi=300)

Regression with PCA

#pca_components = 3

pipe = Pipeline([
    ('PCA', PCA()),
    ('LinearRegression', LinearRegression())
])
param_grid = {
    
    'PCA__n_components': [1, 10, 15, 20, 25, 30, 35, 40, 50, 55, 60, 65, 70],
    'LinearRegression__n_jobs': [1,3,5]
}
model_grid = GridSearchCV(pipe, param_grid, cv = 5, n_jobs = 5)
model_grid.fit(X_cleanFullRate, y_cleanFullRate)
GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('PCA', PCA()),
                                       ('LinearRegression',
                                        LinearRegression())]),
             n_jobs=5,
             param_grid={'LinearRegression__n_jobs': [1, 3, 5],
                         'PCA__n_components': [1, 10, 15, 20, 25, 30, 35, 40,
                                               50, 55, 60, 65, 70]})
model_grid.best_params_ ##Best hyperparameters
{'LinearRegression__n_jobs': 1, 'PCA__n_components': 70}
print(model_grid.best_score_)
0.7242460606519229
y_predicted = model_grid.predict(X_testFull)
##explained variance
explained_variance_score(y_testRate, y_predicted)
0.6841386098841067
r2_score(y_testRate, y_predicted)
0.6838104860252272
plt.figure(figsize=(15,15))
plt.scatter(y_testRate, y_predicted, c='crimson')
#plt.yscale('log')
#plt.xscale('log')
p1 = max(max(y_predicted), max(y_testRate))
p2 = min(min(y_predicted), min(y_testRate))
plt.plot([p1, p2], [p1, p2], 'b-')
plt.xlabel('True Values', fontsize=20)
plt.ylabel('Predictions', fontsize=20)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.axis('equal')
plt.savefig('RegressionAcutalVsPredictedPCA.png',dpi=300)

Classification

Low High

PCA and Random Forest on all Features

pipe2 = Pipeline([
    ('PCA', PCA()),
    ('RandomForestClassifier', RandomForestClassifier())
])

param_grid2 = {
    'PCA__n_components': [1, 10, 30, 50, 70],
    'RandomForestClassifier__n_estimators': [50, 150],
    'RandomForestClassifier__criterion': ['gini', 'entropy'],
    'RandomForestClassifier__max_depth': [8]
}

model_grid2 = GridSearchCV(pipe2, param_grid2, cv = 5, n_jobs = 2)
model_grid2.fit(X_clean2Full, y_clean2Full)
GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('PCA', PCA()),
                                       ('RandomForestClassifier',
                                        RandomForestClassifier())]),
             n_jobs=2,
             param_grid={'PCA__n_components': [1, 10, 30, 50, 70],
                         'RandomForestClassifier__criterion': ['gini',
                                                               'entropy'],
                         'RandomForestClassifier__max_depth': [8],
                         'RandomForestClassifier__n_estimators': [50, 150]})
model_grid2.best_params_ ##Best hyperparameters
{'PCA__n_components': 70,
 'RandomForestClassifier__criterion': 'gini',
 'RandomForestClassifier__max_depth': 8,
 'RandomForestClassifier__n_estimators': 150}
print(model_grid2.best_score_)
0.8418697558607393
accuracy_score(y_test2, model_grid2.predict(X_testFull))
0.8404907975460123
##get predictions
y_predicted = model_grid2.predict(X_testFull)

##Auc Score
fpr, tpr, thresholds = roc_curve(y_test2, y_predicted)
auc(fpr, tpr)
0.8412299446782205
y_predicted = model_grid2.predict(X_testFull)
y_predicted[1:20]
print(confusion_matrix(y_test2, y_predicted))
print(classification_report(y_test2, y_predicted))
[[838 119]
 [193 806]]
              precision    recall  f1-score   support

           0       0.81      0.88      0.84       957
           1       0.87      0.81      0.84       999

    accuracy                           0.84      1956
   macro avg       0.84      0.84      0.84      1956
weighted avg       0.84      0.84      0.84      1956

plt.figure(figsize=(20,20))
cmd = ConfusionMatrixDisplay.from_estimator(model_grid2, X_testFull, y_test2, normalize = 'all', display_labels=['Low','High'])
cmd.figure_.savefig('RandomPCA_LH_Fig1F.png',dpi=300)
<Figure size 1440x1440 with 0 Axes>

Random Forest with Trimmed Variables

pipe3 = Pipeline([
    ('RandomForestClassifier', RandomForestClassifier()) 
])

param_grid3 = {
    'RandomForestClassifier__n_estimators': [50, 150],
    'RandomForestClassifier__criterion': ['gini', 'entropy'],
    'RandomForestClassifier__max_depth': [8]
    
}
    
model_grid3 = GridSearchCV(pipe3, param_grid3, cv = 5, n_jobs = 5)
model_grid3.fit(X_clean2Trimmed, y_clean2Trimmed)
GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('RandomForestClassifier',
                                        RandomForestClassifier())]),
             n_jobs=5,
             param_grid={'RandomForestClassifier__criterion': ['gini',
                                                               'entropy'],
                         'RandomForestClassifier__max_depth': [8],
                         'RandomForestClassifier__n_estimators': [50, 150]})
model_grid3.best_params_ ##Best hyperparameters
{'RandomForestClassifier__criterion': 'gini',
 'RandomForestClassifier__max_depth': 8,
 'RandomForestClassifier__n_estimators': 150}
y_predicted = model_grid3.predict(X_testTrimmed)
y_predicted[1:20]
print(confusion_matrix(y_test2, y_predicted))
print(classification_report(y_test2, y_predicted))
[[839 118]
 [191 808]]
              precision    recall  f1-score   support

           0       0.81      0.88      0.84       957
           1       0.87      0.81      0.84       999

    accuracy                           0.84      1956
   macro avg       0.84      0.84      0.84      1956
weighted avg       0.84      0.84      0.84      1956

print(model_grid3.best_score_)
0.8383546244133087
accuracy_score(y_test2, model_grid3.predict(X_testTrimmed))
0.8420245398773006
##get predictions
y_predicted = model_grid3.predict(X_testTrimmed)

##Auc Score
fpr, tpr, thresholds = roc_curve(y_test2, y_predicted)
auc(fpr, tpr)
0.8427534117189288
plt.figure(figsize=(20,20))
cmd = ConfusionMatrixDisplay.from_estimator(model_grid3, X_testTrimmed, y_test2, normalize = 'all', display_labels=['Low','High'])
cmd.figure_.savefig('Random_LH_Fig2F.png',dpi=300)
<Figure size 1440x1440 with 0 Axes>

Decision tree

pipeDT2 = Pipeline([
    ('DecisionTreeClassifier', DecisionTreeClassifier()) 
])

param_gridDT2 = {
    'DecisionTreeClassifier__min_samples_leaf': [10, 50, 150, 300],
    'DecisionTreeClassifier__criterion': ['gini', 'entropy'],
    'DecisionTreeClassifier__max_depth': [5, 10, 15, 20]
    
}
    
model_gridDT2 = GridSearchCV(pipeDT2, param_gridDT2, cv = 5, n_jobs = 5)
model_gridDT2.fit(X_clean2Trimmed, y_clean2Trimmed)
GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('DecisionTreeClassifier',
                                        DecisionTreeClassifier())]),
             n_jobs=5,
             param_grid={'DecisionTreeClassifier__criterion': ['gini',
                                                               'entropy'],
                         'DecisionTreeClassifier__max_depth': [5, 10, 15, 20],
                         'DecisionTreeClassifier__min_samples_leaf': [10, 50,
                                                                      150,
                                                                      300]})
model_gridDT2.best_params_ ##Best hyperparameters
{'DecisionTreeClassifier__criterion': 'gini',
 'DecisionTreeClassifier__max_depth': 5,
 'DecisionTreeClassifier__min_samples_leaf': 150}
y_predicted = model_gridDT2.predict(X_testTrimmed)
y_predicted[1:20]
print(confusion_matrix(y_test2, y_predicted))
print(classification_report(y_test2, y_predicted))
[[822 135]
 [219 780]]
              precision    recall  f1-score   support

           0       0.79      0.86      0.82       957
           1       0.85      0.78      0.82       999

    accuracy                           0.82      1956
   macro avg       0.82      0.82      0.82      1956
weighted avg       0.82      0.82      0.82      1956

print(model_gridDT2.best_score_)
0.8174472238237808
accuracy_score(y_test2, model_gridDT2.predict(X_testTrimmed))
0.8190184049079755
##get predictions
y_predicted = model_gridDT2.predict(X_testTrimmed)

##Auc Score
fpr, tpr, thresholds = roc_curve(y_test2, y_predicted)
auc(fpr, tpr)
0.8198574750298888
plt.figure(figsize=(20,20))
cmd = ConfusionMatrixDisplay.from_estimator(model_gridDT2, X_testTrimmed, y_test2, normalize = 'all', display_labels=['Low','High'])
cmd.figure_.savefig('DecisionTree_LH_Fig2F.png',dpi=300)
<Figure size 1440x1440 with 0 Axes>

Decision Tree without gridsearchcv

clf = DecisionTreeClassifier(max_depth = 5, criterion = 'gini', min_samples_leaf = 150, random_state=21)
model = clf.fit(X_clean2Trimmed, y_clean2Trimmed)
feature_cols = []
for i in X_clean2Trimmed.columns.values:
    feature_cols.append(i)
type(feature_cols)
target_names=['Low','High']
feature_cols
X = X_clean2Trimmed
y = y_clean2Trimmed
pred = clf.predict(X_testTrimmed)
acc = accuracy_score(pred, y_test2)
acc
0.8190184049079755
fig = plt.figure(figsize=(35,20))
_ = tree.plot_tree(clf, fontsize=12,
                   feature_names=feature_cols,  
                   class_names=target_names,
                   filled=True)
fig.savefig('Tree2ClassF.png',dpi=300)

Dummy Classifier

pipeDum2 = Pipeline([
    ('DummyClassifier', DummyClassifier()) 
])

param_gridDum2 = {
    'DummyClassifier__strategy': ['most_frequent', 'prior', 'stratified', 'uniform'],
}
    
model_gridDum2 = GridSearchCV(pipeDum2, param_gridDum2, cv = 5, n_jobs = 5)
model_gridDum2.fit(X_clean2Trimmed, y_clean2Trimmed)
GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('DummyClassifier', DummyClassifier())]),
             n_jobs=5,
             param_grid={'DummyClassifier__strategy': ['most_frequent', 'prior',
                                                       'stratified',
                                                       'uniform']})
model_gridDum2.best_params_ ##Best hyperparameters
{'DummyClassifier__strategy': 'most_frequent'}
y_predicted = model_gridDum2.predict(X_testTrimmed)
y_predicted[1:20]
print(confusion_matrix(y_test2, y_predicted))
print(classification_report(y_test2, y_predicted))
[[957   0]
 [999   0]]
              precision    recall  f1-score   support

           0       0.49      1.00      0.66       957
           1       0.00      0.00      0.00       999

    accuracy                           0.49      1956
   macro avg       0.24      0.50      0.33      1956
weighted avg       0.24      0.49      0.32      1956

/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
print(model_gridDum2.best_score_)
0.5134850453028799
accuracy_score(y_test2, model_gridDum2.predict(X_testTrimmed))
0.4892638036809816
##get predictions
y_predicted = model_gridDum2.predict(X_testTrimmed)

##Auc Score
fpr, tpr, thresholds = roc_curve(y_test2, y_predicted)
auc(fpr, tpr)
0.5
plt.figure(figsize=(20,20))
cmd = ConfusionMatrixDisplay.from_estimator(model_gridDum2, X_testTrimmed, y_test2, normalize = 'all', display_labels=['Low','High'])
cmd.figure_.savefig('Dummy_LH_Fig2F.png',dpi=300)
<Figure size 1440x1440 with 0 Axes>

Low Medium High

PCA and Random Forest on all Features

pipe4 = Pipeline([
    ('PCA', PCA()),
    ('RandomForestClassifier', RandomForestClassifier())
])

param_grid4 = {
    'PCA__n_components': [1, 10, 30, 50, 70],
    'RandomForestClassifier__n_estimators': [50, 150],
    'RandomForestClassifier__criterion': ['gini', 'entropy'],
    'RandomForestClassifier__max_depth': [8]
}

model_grid4 = GridSearchCV(pipe4, param_grid4, cv = 5, n_jobs = 2)
model_grid4.fit(X_clean3Full, y_clean3Full)
GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('PCA', PCA()),
                                       ('RandomForestClassifier',
                                        RandomForestClassifier())]),
             n_jobs=2,
             param_grid={'PCA__n_components': [1, 10, 30, 50, 70],
                         'RandomForestClassifier__criterion': ['gini',
                                                               'entropy'],
                         'RandomForestClassifier__max_depth': [8],
                         'RandomForestClassifier__n_estimators': [50, 150]})
model_grid4.best_params_ ##Best hyperparameters
{'PCA__n_components': 70,
 'RandomForestClassifier__criterion': 'entropy',
 'RandomForestClassifier__max_depth': 8,
 'RandomForestClassifier__n_estimators': 150}
print(model_grid4.best_score_)
0.7322306120732038
accuracy_score(y_test3, model_grid4.predict(X_testFull))
0.7361963190184049
#Calculate the y_score
y_score = model_grid4.predict_proba(X_testFull)

##get auc score
roc_auc_score(y_test3, y_score, multi_class='ovr')
0.8893431742071044
plt.figure(figsize=(20,20))
cmd = ConfusionMatrixDisplay.from_estimator(model_grid4, X_testFull, y_test3, normalize = 'all', display_labels=['Low','Medium', 'High'])
cmd.figure_.savefig('PCARandom_LMH_Fig3F.png',dpi=300)
<Figure size 1440x1440 with 0 Axes>

Random Forest

pipe5 = Pipeline([
    ('RandomForestClassifier', RandomForestClassifier()) 
])

param_grid5 = {
    'RandomForestClassifier__n_estimators': [50, 150],
    'RandomForestClassifier__criterion': ['gini', 'entropy'],
    'RandomForestClassifier__max_depth': [8]
    
}
    
model_grid5 = GridSearchCV(pipe5, param_grid5, cv = 5, n_jobs = 5)
model_grid5.fit(X_clean3Trimmed, y_clean3Trimmed)
GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('RandomForestClassifier',
                                        RandomForestClassifier())]),
             n_jobs=5,
             param_grid={'RandomForestClassifier__criterion': ['gini',
                                                               'entropy'],
                         'RandomForestClassifier__max_depth': [8],
                         'RandomForestClassifier__n_estimators': [50, 150]})
model_grid5.best_params_ ##Best hyperparameters
{'RandomForestClassifier__criterion': 'gini',
 'RandomForestClassifier__max_depth': 8,
 'RandomForestClassifier__n_estimators': 150}
y_predicted = model_grid5.predict(X_testTrimmed)
y_predicted[1:20]
print(confusion_matrix(y_test3, y_predicted))
print(classification_report(y_test3, y_predicted))
[[503 143   9]
 [170 386  92]
 [ 28 109 516]]
              precision    recall  f1-score   support

           0       0.72      0.77      0.74       655
           1       0.61      0.60      0.60       648
           2       0.84      0.79      0.81       653

    accuracy                           0.72      1956
   macro avg       0.72      0.72      0.72      1956
weighted avg       0.72      0.72      0.72      1956

print(model_grid5.best_score_)
0.7190543395288562
accuracy_score(y_test3, model_grid5.predict(X_testTrimmed))
0.7183026584867076
#Calculate the y_score
y_score = model_grid5.predict_proba(X_testTrimmed)

##get auc score
roc_auc_score(y_test3, y_score, multi_class='ovr')
0.8842140672359636
plt.figure(figsize=(20,20))
cmd = ConfusionMatrixDisplay.from_estimator(model_grid5, X_testTrimmed, y_test3, normalize = 'all', display_labels=['Low', 'Medium','High'])
cmd.figure_.savefig('Random_LMH_Fig4F.png',dpi=300)
<Figure size 1440x1440 with 0 Axes>

Decision Tree

pipeDT3 = Pipeline([
    ('DecisionTreeClassifier', DecisionTreeClassifier()) 
])

param_gridDT3 = {
    'DecisionTreeClassifier__min_samples_leaf': [10, 50, 150, 300],
    'DecisionTreeClassifier__criterion': ['gini', 'entropy'],
    'DecisionTreeClassifier__max_depth': [5, 10, 15, 20]
    
}
    
model_gridDT3 = GridSearchCV(pipeDT3, param_gridDT3, cv = 5, n_jobs = 5)
model_gridDT3.fit(X_clean3Trimmed, y_clean3Trimmed)
GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('DecisionTreeClassifier',
                                        DecisionTreeClassifier())]),
             n_jobs=5,
             param_grid={'DecisionTreeClassifier__criterion': ['gini',
                                                               'entropy'],
                         'DecisionTreeClassifier__max_depth': [5, 10, 15, 20],
                         'DecisionTreeClassifier__min_samples_leaf': [10, 50,
                                                                      150,
                                                                      300]})
print(model_gridDT3.best_params_) ##Best hyperparameters
##model evaluation 
##get predictions
y_predicted = model_gridDT3.predict(X_testTrimmed)
print(confusion_matrix(y_test3, y_predicted))
print(classification_report(y_test3, y_predicted))
##print model score
print("Model Score", model_gridDT3.best_score_)
##Accuracy Score
print("Accuracy Score",accuracy_score(y_test3, model_gridDT3.predict(X_testTrimmed)))

##get auc score
#Calculate the y_score
y_score = model_gridDT3.predict_proba(X_testTrimmed)

print("AUC Score", roc_auc_score(y_test3, y_score, multi_class='ovr'))
{'DecisionTreeClassifier__criterion': 'gini', 'DecisionTreeClassifier__max_depth': 10, 'DecisionTreeClassifier__min_samples_leaf': 50}
[[468 172  15]
 [164 375 109]
 [ 27 125 501]]
              precision    recall  f1-score   support

           0       0.71      0.71      0.71       655
           1       0.56      0.58      0.57       648
           2       0.80      0.77      0.78       653

    accuracy                           0.69      1956
   macro avg       0.69      0.69      0.69      1956
weighted avg       0.69      0.69      0.69      1956

Model Score 0.6807513509049385
Accuracy Score 0.6871165644171779
AUC Score 0.8531655331494479
plt.figure(figsize=(20,20))
cmd = ConfusionMatrixDisplay.from_estimator(model_gridDT3, X_testTrimmed, y_test3, normalize = 'all', display_labels=['Low','Medium','High'])
cmd.figure_.savefig('DecisionTree_LMH_Fig4F.png',dpi=300)
<Figure size 1440x1440 with 0 Axes>

Decision Tree without gridsearchcv

clf = DecisionTreeClassifier(max_depth = 10, criterion = 'gini', min_samples_leaf = 50, random_state=21)
model = clf.fit(X_clean3Trimmed, y_clean3Trimmed)
feature_cols = []
for i in X_clean3Trimmed.columns.values:
    feature_cols.append(i)
type(feature_cols)
target_names=['Low', 'Medium', 'High']
feature_cols
X = X_clean3Trimmed
y = y_clean3Trimmed
fig = plt.figure(figsize=(45,20))
_ = tree.plot_tree(clf, fontsize=12,
                   feature_names=feature_cols,  
                   class_names=target_names,
                   filled=True)
fig.savefig('Tree3ClassF.png',dpi=300)
pred = clf.predict(X_testTrimmed)
acc = accuracy_score(pred, y_test3)
acc
0.6886503067484663

Dummy Classifier

pipeDum3 = Pipeline([
    ('DummyClassifier', DummyClassifier()) 
])

param_gridDum3 = {
    'DummyClassifier__strategy': ['most_frequent', 'prior', 'stratified', 'uniform'],
}
    
model_gridDum3 = GridSearchCV(pipeDum3, param_gridDum3, cv = 5, n_jobs = 5)
model_gridDum3.fit(X_clean3Trimmed, y_clean3Trimmed)
GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('DummyClassifier', DummyClassifier())]),
             n_jobs=5,
             param_grid={'DummyClassifier__strategy': ['most_frequent', 'prior',
                                                       'stratified',
                                                       'uniform']})
model_gridDum3.best_params_ ##Best hyperparameters
{'DummyClassifier__strategy': 'most_frequent'}
y_predicted = model_gridDum3.predict(X_testTrimmed)
y_predicted[1:20]
print(confusion_matrix(y_test3, y_predicted))
print(classification_report(y_test3, y_predicted))


##print model score
print("Model Score:", model_gridDum3.best_score_)

##Accuracy Score
print("Accuracy Score:",accuracy_score(y_test3, model_gridDum3.predict(X_testTrimmed)))

#Calculate the y_score
y_score = model_gridDum3.predict_proba(X_testTrimmed)

print("AUC Score", roc_auc_score(y_test3, y_score, multi_class='ovr'))
[[655   0   0]
 [648   0   0]
 [653   0   0]]
              precision    recall  f1-score   support

           0       0.33      1.00      0.50       655
           1       0.00      0.00      0.00       648
           2       0.00      0.00      0.00       653

    accuracy                           0.33      1956
   macro avg       0.11      0.33      0.17      1956
weighted avg       0.11      0.33      0.17      1956

Model Score: 0.3480628860624276
Accuracy Score: 0.33486707566462165
AUC Score 0.5
/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
plt.figure(figsize=(20,20))
cmd = ConfusionMatrixDisplay.from_estimator(model_gridDum3, X_testTrimmed, y_test3, normalize = 'all', display_labels=['Low','Medium','High'])
cmd.figure_.savefig('Dummy_LMH_Fig4F.png',dpi=300)
<Figure size 1440x1440 with 0 Axes>

Quartiles

PCA and Random Forest on all Features

pipe6 = Pipeline([
    ('PCA', PCA()),
    ('RandomForestClassifier', RandomForestClassifier())
])

param_grid6 = {
    'PCA__n_components': [1, 10, 30, 50, 70],
    'RandomForestClassifier__n_estimators': [50, 150],
    'RandomForestClassifier__criterion': ['gini', 'entropy'],
    'RandomForestClassifier__max_depth': [8]
}

model_grid6 = GridSearchCV(pipe6, param_grid6, cv = 5, n_jobs = 2)
model_grid6.fit(X_clean4Full, y_clean4Full)
GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('PCA', PCA()),
                                       ('RandomForestClassifier',
                                        RandomForestClassifier())]),
             n_jobs=2,
             param_grid={'PCA__n_components': [1, 10, 30, 50, 70],
                         'RandomForestClassifier__criterion': ['gini',
                                                               'entropy'],
                         'RandomForestClassifier__max_depth': [8],
                         'RandomForestClassifier__n_estimators': [50, 150]})
model_grid6.best_params_ ##Best hyperparameters
{'PCA__n_components': 70,
 'RandomForestClassifier__criterion': 'entropy',
 'RandomForestClassifier__max_depth': 8,
 'RandomForestClassifier__n_estimators': 150}
print(model_grid6.best_score_)
0.6550116817290348
accuracy_score(y_test4, model_grid6.predict(X_testFull))
0.6451942740286298
#Calculate the y_score
y_score = model_grid6.predict_proba(X_testFull)

##get auc score
roc_auc_score(y_test4, y_score, multi_class='ovr')
0.8741130496755417
plt.figure(figsize=(20,20))
cmd = ConfusionMatrixDisplay.from_estimator(model_grid6, X_testFull, y_test4, normalize = 'all', display_labels=['Q1','Q2', 'Q3','Q4'])
cmd.figure_.savefig('PCARandom_Quart_Fig5F.png',dpi=300)
<Figure size 1440x1440 with 0 Axes>

Random Forest with Trimmed Set

pipe7 = Pipeline([
    ('RandomForestClassifier', RandomForestClassifier()) 
])

param_grid7 = {
    'RandomForestClassifier__n_estimators': [50, 150],
    'RandomForestClassifier__criterion': ['gini', 'entropy'],
    'RandomForestClassifier__max_depth': [8]
    
}
    
model_grid7 = GridSearchCV(pipe7, param_grid7, cv = 5, n_jobs = 5)
model_grid7.fit(X_clean4Trimmed, y_clean4Trimmed)
GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('RandomForestClassifier',
                                        RandomForestClassifier())]),
             n_jobs=5,
             param_grid={'RandomForestClassifier__criterion': ['gini',
                                                               'entropy'],
                         'RandomForestClassifier__max_depth': [8],
                         'RandomForestClassifier__n_estimators': [50, 150]})
model_grid7.best_params_ ##Best hyperparameters
{'RandomForestClassifier__criterion': 'entropy',
 'RandomForestClassifier__max_depth': 8,
 'RandomForestClassifier__n_estimators': 50}
y_predicted = model_grid7.predict(X_testTrimmed)
y_predicted[1:20]
print(confusion_matrix(y_test4, y_predicted))
print(classification_report(y_test4, y_predicted))
[[370 111  17   1]
 [131 226 107   8]
 [ 36 114 272  63]
 [ 10  14 108 368]]
              precision    recall  f1-score   support

           0       0.68      0.74      0.71       499
           1       0.49      0.48      0.48       472
           2       0.54      0.56      0.55       485
           3       0.84      0.74      0.78       500

    accuracy                           0.63      1956
   macro avg       0.63      0.63      0.63      1956
weighted avg       0.64      0.63      0.63      1956

print(model_grid7.best_score_)
0.6278644934868669
accuracy_score(y_test4, model_grid7.predict(X_testTrimmed))
0.6319018404907976
#Calculate the y_score
y_score = model_grid7.predict_proba(X_testTrimmed)

##get auc score
roc_auc_score(y_test4, y_score, multi_class='ovr')
0.8677186285233699
plt.figure(figsize=(20,20))
cmd = ConfusionMatrixDisplay.from_estimator(model_grid7, X_testTrimmed, y_test4, normalize = 'all', display_labels=['Q1', 'Q2','Q3', 'Q4'])
cmd.figure_.savefig('Random_Quart_Fig6F.png',dpi=300)
<Figure size 1440x1440 with 0 Axes>

Decision Tree on Trimmed Set

pipeDT4 = Pipeline([
    ('DecisionTreeClassifier', DecisionTreeClassifier()) 
])

param_gridDT4 = {
    'DecisionTreeClassifier__min_samples_leaf': [10, 50, 150, 300],
    'DecisionTreeClassifier__criterion': ['gini', 'entropy'],
    'DecisionTreeClassifier__max_depth': [5, 10, 15, 20]
    
}
    
model_gridDT4 = GridSearchCV(pipeDT4, param_gridDT4, cv = 5, n_jobs = 5)
model_gridDT4.fit(X_clean4Trimmed, y_clean4Trimmed)
GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('DecisionTreeClassifier',
                                        DecisionTreeClassifier())]),
             n_jobs=5,
             param_grid={'DecisionTreeClassifier__criterion': ['gini',
                                                               'entropy'],
                         'DecisionTreeClassifier__max_depth': [5, 10, 15, 20],
                         'DecisionTreeClassifier__min_samples_leaf': [10, 50,
                                                                      150,
                                                                      300]})
print(model_gridDT4.best_params_) ##Best hyperparameters
##model evaluation 
##get predictions
y_predicted = model_gridDT4.predict(X_testTrimmed)
print(confusion_matrix(y_test4, y_predicted))
print(classification_report(y_test4, y_predicted))
##print model score
print("Model Score", model_gridDT4.best_score_)
##Accuracy Score
print("Accuracy Score",accuracy_score(y_test4, model_gridDT4.predict(X_testTrimmed)))

##get auc score
#Calculate the y_score
y_score = model_gridDT4.predict_proba(X_testTrimmed)

print("AUC Score", roc_auc_score(y_test4, y_score, multi_class='ovr'))
{'DecisionTreeClassifier__criterion': 'gini', 'DecisionTreeClassifier__max_depth': 10, 'DecisionTreeClassifier__min_samples_leaf': 50}
[[351 113  34   1]
 [129 219 118   6]
 [ 44 125 250  66]
 [ 11  25 123 341]]
              precision    recall  f1-score   support

           0       0.66      0.70      0.68       499
           1       0.45      0.46      0.46       472
           2       0.48      0.52      0.50       485
           3       0.82      0.68      0.75       500

    accuracy                           0.59      1956
   macro avg       0.60      0.59      0.59      1956
weighted avg       0.61      0.59      0.60      1956

Model Score 0.5870157986991487
Accuracy Score 0.593558282208589
AUC Score 0.8390671278900932
plt.figure(figsize=(20,20))
cmd = ConfusionMatrixDisplay.from_estimator(model_gridDT4, X_testTrimmed, y_test4, normalize = 'all', display_labels=['Q1','Q2','Q3', 'Q4'])
cmd.figure_.savefig('DecisionTree_Quart_Fig5F.png',dpi=300)
<Figure size 1440x1440 with 0 Axes>

Decision Tree without gridsearchcv

clf = DecisionTreeClassifier(max_depth = 10, criterion = 'gini', min_samples_leaf = 50, random_state=21)
model = clf.fit(X_clean4Trimmed, y_clean4Trimmed)
pred = clf.predict(X_testTrimmed)
acc = accuracy_score(pred, y_test4)
acc
0.593558282208589
feature_cols = []
for i in X_clean4Trimmed.columns.values:
    feature_cols.append(i)
type(feature_cols)
target_names=['Q1', 'Q2', 'Q3', 'Q4']
feature_cols
X = X_clean4Trimmed
y = y_clean4Trimmed
fig = plt.figure(figsize=(45,20))
_ = tree.plot_tree(clf, fontsize=12,
                   feature_names=feature_cols,  
                   class_names=target_names,
                   filled=True)
fig.savefig('Tree4Class.png',dpi=300)

Dummy Classifier

pipeDum4 = Pipeline([
    ('DummyClassifier', DummyClassifier()) 
])

param_gridDum4 = {
    'DummyClassifier__strategy': ['most_frequent', 'prior', 'stratified', 'uniform'],
}
    
model_gridDum4 = GridSearchCV(pipeDum4, param_gridDum4, cv = 5, n_jobs = 5)
model_gridDum4.fit(X_clean4Trimmed, y_clean4Trimmed)
GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('DummyClassifier', DummyClassifier())]),
             n_jobs=5,
             param_grid={'DummyClassifier__strategy': ['most_frequent', 'prior',
                                                       'stratified',
                                                       'uniform']})
model_gridDum4.best_params_ ##Best hyperparameters
{'DummyClassifier__strategy': 'most_frequent'}
y_predicted = model_gridDum4.predict(X_testTrimmed)
y_predicted[1:20]
print(confusion_matrix(y_test4, y_predicted))
print(classification_report(y_test4, y_predicted))


##print model score
print("Model Score:", model_gridDum4.best_score_)

##Accuracy Score
print("Accuracy Score:",accuracy_score(y_test4, model_gridDum4.predict(X_testTrimmed)))

#Calculate the y_score
y_score = model_gridDum4.predict_proba(X_testTrimmed)

print("AUC Score", roc_auc_score(y_test4, y_score, multi_class='ovr'))
[[499   0   0   0]
 [472   0   0   0]
 [485   0   0   0]
 [500   0   0   0]]
              precision    recall  f1-score   support

           0       0.26      1.00      0.41       499
           1       0.00      0.00      0.00       472
           2       0.00      0.00      0.00       485
           3       0.00      0.00      0.00       500

    accuracy                           0.26      1956
   macro avg       0.06      0.25      0.10      1956
weighted avg       0.07      0.26      0.10      1956

Model Score: 0.26750418527754816
Accuracy Score: 0.2551124744376278
AUC Score 0.5
/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
plt.figure(figsize=(20,20))
cmd = ConfusionMatrixDisplay.from_estimator(model_gridDum4, X_testTrimmed, y_test4, normalize = 'all', display_labels=['Q1','Q2','Q3', 'Q4'])
cmd.figure_.savefig('Dummy_Quart_Fig5F.png',dpi=300)
<Figure size 1440x1440 with 0 Axes>

Without Selectors

For this analysis I used random but without PCA since the features are already reduced significanly.

Regression

X_train = train.loc[:, ~dropped.columns.isin(['unitid', 'Grad_Rates_Two_Classes', 'exp_acad_inst_student_total_per','Grad_Rates_Three_Classes', 'Grad_Rates_Quartiles', 'completion_rate_150pct', 'year', 'cc_basic_2010', 'act_composite_75_pctl', 'act_composite_25_pctl', 'acceptance_rate'])]

X_test = test.loc[:, ~dropped.columns.isin(['unitid', 'Grad_Rates_Two_Classes', 'exp_acad_inst_student_total_per','Grad_Rates_Three_Classes', 'Grad_Rates_Quartiles', 'completion_rate_150pct', 'year', 'cc_basic_2010', 'act_composite_75_pctl', 'act_composite_25_pctl', 'acceptance_rate'])]

y_train = train.completion_rate_150pct

y_test = test.completion_rate_150pct
from sklearn.covariance import EllipticEnvelope
envelope1 = EllipticEnvelope(support_fraction=1, contamination=0.03).fit(X_train)

outliers1 = envelope1.predict(X_train)==-1  


X_clean1 = X_train[~outliers1]  
y_clean1 = y_train[~outliers1]

print(f"Num of outliers = {np.sum(outliers1)}")
/opt/conda/lib/python3.7/site-packages/sklearn/covariance/_robust_covariance.py:739: UserWarning: The covariance matrix associated to your dataset is not full rank
  "The covariance matrix associated to your dataset is not full rank"
Num of outliers = 353
pipe20 = Pipeline([
    ('PCA', PCA()),
    ('LinearRegression', LinearRegression())
])
# Configure the parameters for grid search
param_grid20 = {
    'PCA__n_components': [1, 10, 30, 50, 55],
    'LinearRegression__n_jobs': [1,3,5]
}

# Train the Pipeline with Grid Search 
model_grid20 = GridSearchCV(pipe20, param_grid20, cv = 5, n_jobs = 5)
model_grid20.fit(X_clean1, y_clean1)
GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('PCA', PCA()),
                                       ('LinearRegression',
                                        LinearRegression())]),
             n_jobs=5,
             param_grid={'LinearRegression__n_jobs': [1, 3, 5],
                         'PCA__n_components': [1, 10, 30, 50, 55]})
model_grid20.best_params_ ##Best hyperparameters
{'LinearRegression__n_jobs': 1, 'PCA__n_components': 55}
print(model_grid20.best_score_)
0.2857545618109319
y_predicted = model_grid20.predict(X_test)
##explained variance
explained_variance_score(y_test, y_predicted)
0.2684675397803741
##r2 score
r2_score(y_test, y_predicted)
0.2654696748067532
plt.figure(figsize=(15,15))
plt.scatter(y_test, y_predicted, c='crimson')
#plt.yscale('log')
#plt.xscale('log')

p1 = max(max(y_predicted), max(y_test))
p2 = min(min(y_predicted), min(y_test))
plt.plot([p1, p2], [p1, p2], 'b-')
plt.xlabel('True Values', fontsize=20)
plt.ylabel('Predictions', fontsize=20)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.axis('equal')
plt.savefig('RegressionAcutalVsPredicted_No_SelectorsFullSet.png',dpi=300)
X_train = train[['est_fte', 'exp_instruc_total_per', 'exp_acad_supp_total_per', 'exp_student_serv_total_per', 'exp_inst_supp_total_per']]

X_test = test[['est_fte', 'exp_instruc_total_per', 'exp_acad_supp_total_per', 'exp_student_serv_total_per', 'exp_inst_supp_total_per']]

y_train = train.completion_rate_150pct

y_test = test.completion_rate_150pct

X_train.head(1)
est_fte exp_instruc_total_per exp_acad_supp_total_per exp_student_serv_total_per exp_inst_supp_total_per
11618 0.804298 0.092503 0.049112 0.06432 0.055147
from sklearn.covariance import EllipticEnvelope
envelope1 = EllipticEnvelope(support_fraction=1, contamination=0.03).fit(X_train)

outliers1 = envelope1.predict(X_train)==-1  


X_clean1 = X_train[~outliers1]  
y_clean1 = y_train[~outliers1]

print(f"Num of outliers = {np.sum(outliers1)}")
Num of outliers = 353
pipe20 = Pipeline([
    ('LinearRegression', LinearRegression())
])
# Configure the parameters for grid search
param_grid20 = {
    'LinearRegression__n_jobs': [1,3,5]
}
model_grid20 = GridSearchCV(pipe20, param_grid20, cv = 5, n_jobs = 5)
model_grid20.fit(X_clean1, y_clean1)
GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('LinearRegression',
                                        LinearRegression())]),
             n_jobs=5, param_grid={'LinearRegression__n_jobs': [1, 3, 5]})
model_grid20.best_params_ ##Best hyperparameters
{'LinearRegression__n_jobs': 1}
print(model_grid20.best_score_)
0.3391828595864991
y_predicted = model_grid20.predict(X_test)
explained_variance_score(y_test, y_predicted)
-0.861578290250389
##r2 score
r2_score(y_test, y_predicted)
-0.8757157627565033
plt.figure(figsize=(15,15))
plt.scatter(y_test, y_predicted, c='crimson')
#plt.yscale('log')
#plt.xscale('log')

p1 = max(max(y_predicted), max(y_test))
p2 = min(min(y_predicted), min(y_test))
plt.plot([p1, p2], [p1, p2], 'b-')
plt.xlabel('True Values', fontsize=20)
plt.ylabel('Predictions', fontsize=20)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.axis('equal')
plt.savefig('RegressionAcutalVsPredicted_No_Selectors.png',dpi=300)

Low High

X_train = train[['est_fte', 'exp_instruc_total_per', 'exp_acad_supp_total_per', 'exp_student_serv_total_per', 'exp_inst_supp_total_per']]

X_test = test[['est_fte', 'exp_instruc_total_per', 'exp_acad_supp_total_per', 'exp_student_serv_total_per', 'exp_inst_supp_total_per']]

y_train = train.Grad_Rates_Two_Classes

y_test = test.Grad_Rates_Two_Classes

X_train.head(1)
est_fte exp_instruc_total_per exp_acad_supp_total_per exp_student_serv_total_per exp_inst_supp_total_per
11618 0.804298 0.092503 0.049112 0.06432 0.055147
from sklearn.covariance import EllipticEnvelope
envelope1 = EllipticEnvelope(support_fraction=1, contamination=0.03).fit(X_train)

outliers1 = envelope1.predict(X_train)==-1  


X_clean1 = X_train[~outliers1]  
y_clean1 = y_train[~outliers1]

print(f"Num of outliers = {np.sum(outliers1)}")
Num of outliers = 353
pipe8 = Pipeline([
    ('RandomForestClassifier', RandomForestClassifier()) 
])

param_grid8 = {
    'RandomForestClassifier__n_estimators': [50, 100, 150],
    'RandomForestClassifier__criterion': ['gini', 'entropy'],
    'RandomForestClassifier__max_depth': [8]
    
}
    
model_grid8 = GridSearchCV(pipe8, param_grid8, cv = 5, n_jobs = 5)
model_grid8.fit(X_clean1, y_clean1)
GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('RandomForestClassifier',
                                        RandomForestClassifier())]),
             n_jobs=5,
             param_grid={'RandomForestClassifier__criterion': ['gini',
                                                               'entropy'],
                         'RandomForestClassifier__max_depth': [8],
                         'RandomForestClassifier__n_estimators': [50, 100,
                                                                  150]})
model_grid8.best_params_ ##Best hyperparameters
{'RandomForestClassifier__criterion': 'gini',
 'RandomForestClassifier__max_depth': 8,
 'RandomForestClassifier__n_estimators': 100}
y_predicted = model_grid8.predict(X_test)
y_predicted[1:20]
print(confusion_matrix(y_test, y_predicted))
print(classification_report(y_test, y_predicted))
[[762 195]
 [241 758]]
              precision    recall  f1-score   support

           0       0.76      0.80      0.78       957
           1       0.80      0.76      0.78       999

    accuracy                           0.78      1956
   macro avg       0.78      0.78      0.78      1956
weighted avg       0.78      0.78      0.78      1956

print(model_grid8.best_score_)
0.7620134638970123
accuracy_score(y_test, model_grid8.predict(X_test))
0.7770961145194274
##get predictions
y_predicted = model_grid8.predict(X_test)

##Auc Score
fpr, tpr, thresholds = roc_curve(y_test, y_predicted)
auc(fpr, tpr)
0.7774985016364326
plt.figure(figsize=(20,20))
cmd = ConfusionMatrixDisplay.from_estimator(model_grid8, X_test, y_test, normalize = 'all', display_labels=['Low', 'High'])
cmd.figure_.savefig('Random_LH_Fig7F.png',dpi=300)
<Figure size 1440x1440 with 0 Axes>

Decision Tree

clf = DecisionTreeClassifier(max_depth = 4, random_state=21)
model = clf.fit(X_clean1, y_clean1)
pred = clf.predict(X_test)
acc = accuracy_score(pred, y_test)
acc
0.7305725971370143
feature_cols = []
for i in X_clean1.columns.values:
    feature_cols.append(i)
type(feature_cols)
target_names=['Low', 'High']
feature_cols
X = X_clean1
y = y_clean1
fig = plt.figure(figsize=(35,20))
_ = tree.plot_tree(clf, fontsize=12,
                   feature_names=feature_cols,  
                   class_names=target_names,
                   filled=True)
fig.savefig('Tree2ClassNoSelectivityF.png',dpi=300)

Dummy Classifier

DumPipe = Pipeline([
    ('DummyClassifier', DummyClassifier()) 
])

Dum_param_grid = {
    'DummyClassifier__strategy': ['most_frequent', 'prior', 'stratified', 'uniform'],
}
    
dum_model = GridSearchCV(DumPipe, Dum_param_grid, cv = 5, n_jobs = 5)
dum_model.fit(X_clean1, y_clean1)

print("Best Hyperparameters:",dum_model.best_params_) ##Best hyperparameters
Best Hyperparameters: {'DummyClassifier__strategy': 'most_frequent'}
y_predicted = dum_model.predict(X_test)
y_predicted[1:20]
print(confusion_matrix(y_test, y_predicted))
print(classification_report(y_test, y_predicted))


##print model score
print("Model Score:", dum_model.best_score_)

##Accuracy Score
print("Accuracy Score:",accuracy_score(y_test, dum_model.predict(X_test)))

##AUC
##get predictions
y_predicted = dum_model.predict(X_test)

##Auc Score
fpr, tpr, thresholds = roc_curve(y_test, y_predicted)
print("AUC Score:",auc(fpr, tpr))
[[957   0]
 [999   0]]
              precision    recall  f1-score   support

           0       0.49      1.00      0.66       957
           1       0.00      0.00      0.00       999

    accuracy                           0.49      1956
   macro avg       0.24      0.50      0.33      1956
weighted avg       0.24      0.49      0.32      1956

Model Score: 0.5149784696510454
Accuracy Score: 0.4892638036809816
AUC Score: 0.5
/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
plt.figure(figsize=(20,20))
cmd = ConfusionMatrixDisplay.from_estimator(dum_model, X_test, y_test, normalize = 'all', display_labels=['Low', 'High'])
cmd.figure_.savefig('Dummy_LH_Fig7F.png',dpi=300)
<Figure size 1440x1440 with 0 Axes>

Low Medium High

X_train = train[['est_fte', 'exp_instruc_total_per', 'exp_acad_supp_total_per', 'exp_student_serv_total_per', 'exp_inst_supp_total_per']]

X_test = test[['est_fte', 'exp_instruc_total_per', 'exp_acad_supp_total_per', 'exp_student_serv_total_per', 'exp_inst_supp_total_per']]

y_train = train.Grad_Rates_Three_Classes

y_test = test.Grad_Rates_Three_Classes

X_train.head(1)
est_fte exp_instruc_total_per exp_acad_supp_total_per exp_student_serv_total_per exp_inst_supp_total_per
11618 0.804298 0.092503 0.049112 0.06432 0.055147
from sklearn.covariance import EllipticEnvelope
envelope1 = EllipticEnvelope(support_fraction=1, contamination=0.03).fit(X_train)

outliers1 = envelope1.predict(X_train)==-1  


X_clean1 = X_train[~outliers1]  
y_clean1 = y_train[~outliers1]

print(f"Num of outliers = {np.sum(outliers1)}")
Num of outliers = 353
pipe9 = Pipeline([
    ('RandomForestClassifier', RandomForestClassifier()) 
])

param_grid9 = {
    'RandomForestClassifier__n_estimators': [50, 100, 150],
    'RandomForestClassifier__criterion': ['gini', 'entropy'],
    'RandomForestClassifier__max_depth': [8]
    
}
    
model_grid9 = GridSearchCV(pipe9, param_grid9, cv = 5, n_jobs = 5)
model_grid9.fit(X_clean1, y_clean1)
GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('RandomForestClassifier',
                                        RandomForestClassifier())]),
             n_jobs=5,
             param_grid={'RandomForestClassifier__criterion': ['gini',
                                                               'entropy'],
                         'RandomForestClassifier__max_depth': [8],
                         'RandomForestClassifier__n_estimators': [50, 100,
                                                                  150]})
model_grid9.best_params_ ##Best hyperparameters
{'RandomForestClassifier__criterion': 'entropy',
 'RandomForestClassifier__max_depth': 8,
 'RandomForestClassifier__n_estimators': 100}
y_predicted = model_grid9.predict(X_test)
y_predicted[1:20]
print(confusion_matrix(y_test, y_predicted))
print(classification_report(y_test, y_predicted))
[[459 137  59]
 [165 332 151]
 [ 49 125 479]]
              precision    recall  f1-score   support

           0       0.68      0.70      0.69       655
           1       0.56      0.51      0.53       648
           2       0.70      0.73      0.71       653

    accuracy                           0.65      1956
   macro avg       0.65      0.65      0.65      1956
weighted avg       0.65      0.65      0.65      1956

print(model_grid9.best_score_)
0.6210124859815391
accuracy_score(y_test, model_grid9.predict(X_test))
0.6492842535787321
#Calculate the y_score
y_score = model_grid9.predict_proba(X_test)

##get auc score
roc_auc_score(y_test, y_score, multi_class='ovr')
0.8173649951180163
plt.figure(figsize=(20,20))
cmd = ConfusionMatrixDisplay.from_estimator(model_grid9, X_test, y_test, normalize = 'all', display_labels=['Low', 'Medium', 'High'])
cmd.figure_.savefig('Random_LMH_Fig8F.png',dpi=300)
<Figure size 1440x1440 with 0 Axes>

Decision Tree

clf = DecisionTreeClassifier(max_depth = 4, random_state=21)
model = clf.fit(X_clean1, y_clean1)
pred = clf.predict(X_test)
acc = accuracy_score(pred, y_test)
acc
0.5547034764826176
feature_cols = []
for i in X_clean1.columns.values:
    feature_cols.append(i)
type(feature_cols)
target_names=['Low', 'Medium', 'High']
feature_cols
X = X_clean1
y = y_clean1
fig = plt.figure(figsize=(45,20))
_ = tree.plot_tree(clf, fontsize=12,
                   feature_names=feature_cols,  
                   class_names=target_names,
                   filled=True)
fig.savefig('Tree3ClassNoSelectivityF.png',dpi=300)

Dummy Classifier

DumPipe = Pipeline([
    ('DummyClassifier', DummyClassifier()) 
])

Dum_param_grid = {
    'DummyClassifier__strategy': ['most_frequent', 'prior', 'stratified', 'uniform'],
}
    
dum_model = GridSearchCV(DumPipe, Dum_param_grid, cv = 5, n_jobs = 5)
dum_model.fit(X_clean1, y_clean1)

print("Best Hyperparameters:",dum_model.best_params_) ##Best hyperparameters
Best Hyperparameters: {'DummyClassifier__strategy': 'most_frequent'}
y_predicted = dum_model.predict(X_test)
y_predicted[1:20]
print(confusion_matrix(y_test, y_predicted))
print(classification_report(y_test, y_predicted))


##print model score
print("Model Score:", dum_model.best_score_)

##Accuracy Score
print("Accuracy Score:",accuracy_score(y_test, dum_model.predict(X_test)))

#Calculate the y_score
y_score = dum_model.predict_proba(X_test)

print("AUC Score", roc_auc_score(y_test, y_score, multi_class='ovr'))
[[655   0   0]
 [648   0   0]
 [653   0   0]]
              precision    recall  f1-score   support

           0       0.33      1.00      0.50       655
           1       0.00      0.00      0.00       648
           2       0.00      0.00      0.00       653

    accuracy                           0.33      1956
   macro avg       0.11      0.33      0.17      1956
weighted avg       0.11      0.33      0.17      1956

Model Score: 0.3489413891339466
Accuracy Score: 0.33486707566462165
AUC Score 0.5
/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
plt.figure(figsize=(20,20))
cmd = ConfusionMatrixDisplay.from_estimator(dum_model, X_test, y_test, normalize = 'all', display_labels=['Low', 'Medium', 'High'])
cmd.figure_.savefig('Dummy_LMH_Fig8F.png',dpi=300)
<Figure size 1440x1440 with 0 Axes>

Quartiles

X_train = train[['est_fte', 'exp_instruc_total_per', 'exp_acad_supp_total_per', 'exp_student_serv_total_per', 'exp_inst_supp_total_per']]

X_test = test[['est_fte', 'exp_instruc_total_per', 'exp_acad_supp_total_per', 'exp_student_serv_total_per', 'exp_inst_supp_total_per']]

y_train = train.Grad_Rates_Quartiles

y_test = test.Grad_Rates_Quartiles

X_train.head(1)
est_fte exp_instruc_total_per exp_acad_supp_total_per exp_student_serv_total_per exp_inst_supp_total_per
11618 0.804298 0.092503 0.049112 0.06432 0.055147
from sklearn.covariance import EllipticEnvelope
envelope1 = EllipticEnvelope(support_fraction=1, contamination=0.03).fit(X_train)

outliers1 = envelope1.predict(X_train)==-1  


X_clean1 = X_train[~outliers1]  
y_clean1 = y_train[~outliers1]

print(f"Num of outliers = {np.sum(outliers1)}")
Num of outliers = 353
pipe10 = Pipeline([
    ('RandomForestClassifier', RandomForestClassifier()) 
])

param_grid10 = {
    'RandomForestClassifier__n_estimators': [50, 100, 150],
    'RandomForestClassifier__criterion': ['gini', 'entropy'],
    'RandomForestClassifier__max_depth': [8]
    
}
    
model_grid10 = GridSearchCV(pipe10, param_grid10, cv = 5, n_jobs = 5)
model_grid10.fit(X_clean1, y_clean1)
GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('RandomForestClassifier',
                                        RandomForestClassifier())]),
             n_jobs=5,
             param_grid={'RandomForestClassifier__criterion': ['gini',
                                                               'entropy'],
                         'RandomForestClassifier__max_depth': [8],
                         'RandomForestClassifier__n_estimators': [50, 100,
                                                                  150]})
model_grid10.best_params_ ##Best hyperparameters
{'RandomForestClassifier__criterion': 'entropy',
 'RandomForestClassifier__max_depth': 8,
 'RandomForestClassifier__n_estimators': 150}
y_predicted = model_grid10.predict(X_test)
y_predicted[1:20]
print(confusion_matrix(y_test, y_predicted))
print(classification_report(y_test, y_predicted))
[[313 101  60  25]
 [129 195 107  41]
 [ 68 108 209 100]
 [ 19  27 116 338]]
              precision    recall  f1-score   support

           0       0.59      0.63      0.61       499
           1       0.45      0.41      0.43       472
           2       0.42      0.43      0.43       485
           3       0.67      0.68      0.67       500

    accuracy                           0.54      1956
   macro avg       0.53      0.54      0.54      1956
weighted avg       0.54      0.54      0.54      1956

print(model_grid10.best_score_)
0.5306160481563553
accuracy_score(y_test, model_grid10.predict(X_test))
0.5393660531697342
#Calculate the y_score
y_score = model_grid10.predict_proba(X_test)

##get auc score
roc_auc_score(y_test, y_score, multi_class='ovr')
0.808196948996172
plt.figure(figsize=(20,20))
cmd = ConfusionMatrixDisplay.from_estimator(model_grid10, X_test, y_test, normalize = 'all', display_labels=['Q1', 'Q2', 'Q3', 'Q4'])
cmd.figure_.savefig('Random_Quart_Fig9F.png',dpi=300)
<Figure size 1440x1440 with 0 Axes>

Decision Tree

clf = DecisionTreeClassifier(max_depth = 4, random_state=21)
model = clf.fit(X_clean1, y_clean1)
pred = clf.predict(X_test)
acc = accuracy_score(pred, y_test)
acc
0.48517382413087934
feature_cols = []
for i in X_clean1.columns.values:
    feature_cols.append(i)
type(feature_cols)
target_names=['Q1', 'Q2', 'Q3', 'Q4']
feature_cols
X = X_clean1
y = y_clean1
fig = plt.figure(figsize=(45,20))
_ = tree.plot_tree(clf, fontsize=12,
                   feature_names=feature_cols,  
                   class_names=target_names,
                   filled=True)
fig.savefig('Tree4ClassNoSelectivityF.png',dpi=300)

Dummy Classifier

DumPipe = Pipeline([
    ('DummyClassifier', DummyClassifier()) 
])

Dum_param_grid = {
    'DummyClassifier__strategy': ['most_frequent', 'prior', 'stratified', 'uniform'],
}
    
dum_model = GridSearchCV(DumPipe, Dum_param_grid, cv = 5, n_jobs = 5)
dum_model.fit(X_clean1, y_clean1)

print("Best Hyperparameters:",dum_model.best_params_) ##Best hyperparameters
Best Hyperparameters: {'DummyClassifier__strategy': 'most_frequent'}
y_predicted = dum_model.predict(X_test)
y_predicted[1:20]
print(confusion_matrix(y_test, y_predicted))
print(classification_report(y_test, y_predicted))


##print model score
print("Model Score:", dum_model.best_score_)

##Accuracy Score
print("Accuracy Score:",accuracy_score(y_test, dum_model.predict(X_test)))

#Calculate the y_score
y_score = dum_model.predict_proba(X_test)

print("AUC Score", roc_auc_score(y_test, y_score, multi_class='ovr'))
[[499   0   0   0]
 [472   0   0   0]
 [485   0   0   0]
 [500   0   0   0]]
              precision    recall  f1-score   support

           0       0.26      1.00      0.41       499
           1       0.00      0.00      0.00       472
           2       0.00      0.00      0.00       485
           3       0.00      0.00      0.00       500

    accuracy                           0.26      1956
   macro avg       0.06      0.25      0.10      1956
weighted avg       0.07      0.26      0.10      1956

Model Score: 0.2681191065541948
Accuracy Score: 0.2551124744376278
AUC Score 0.5
/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
plt.figure(figsize=(20,20))
cmd = ConfusionMatrixDisplay.from_estimator(dum_model, X_test, y_test, normalize = 'all', display_labels=['Q1', 'Q2', 'Q3', 'Q4'])
cmd.figure_.savefig('Dummy_Quart_Fig9F.png',dpi=300)
<Figure size 1440x1440 with 0 Axes>